Open Tabs
- DRandomForest.ipynb
- CreateDPred.ipynb
- DAnalysisPred.ipynb
- ORandomForest.ipynb
- CreateOPred.ipynb
- OAnalysisPred.ipynb
Kernels
- Basketball100.ipynb
- DRandomForest.ipynb
- OAnalysisPred.ipynb
- ORandomForest.ipynb
- CreateOPred.ipynb
- CreateDPred.ipynb
- DAnalysisPred.ipynb
Terminals
- Nfl Point Prediction4 minutes ago
- Untitled Folder 2a month ago
- Untitled Folder 1a month ago
- Untitled Foldera month ago
- untitled.txt5 minutes ago
- tree.png7 days ago
- tree.dot7 days ago
- Basketball100.ipynb8 days ago
- RushPG.ipynb9 days ago
- PlayerPassing.ipynb9 days ago
- Untitled14.ipynb9 days ago
- Untitled1.ipynb16 days ago
- Untitled13.ipynb17 days ago
- Untitled12.ipynb21 days ago
- Untitled10.ipynb21 days ago
- Untitled11.ipynb21 days ago
- Untitled6.ipynb21 days ago
- Untitled8.ipynb21 days ago
- Untitled9.ipynb21 days ago
- PassPG.ipynb21 days ago
- Untitled.ipynb24 days ago
- OverallRank.ipynb24 days ago
- PassORank.ipynb24 days ago
- PassDRank.ipynba month ago
- RushDRank.ipynba month ago
- RushORank.ipynba month ago
- Untitled5.ipynba month ago
- Untitled4.ipynba month ago
- Untitled7.ipynba month ago
- Untitled3.ipynba month ago
- Untitled2.ipynba month ago
- activestate.yamla month ago
- Random Forest Regression of the defence of NFL teams through the 2018-21 seasons
- Scatter Graph
- Defining Variables
- Linear Regression
- Projection and Prediction Comparison
- New DataFrame
- Bar Chart
- DRandomForest.ipynb
- CreateDPred.ipynb
- DAnalysisPred.ipynb
- ORandomForest.ipynb
- CreateOPred.ipynb
- OAnalysisPred.ipynb
xxxxxxxxxx# Random Forest Regression of the defence of NFL teams through the 2018-21 seasons All modules and libraries imported. csv containing raw data from ProFootballReference.com is also imported and read by the pandas read function. This is definedas df.Random Forest Regression of the defence of NFL teams through the 2018-21 seasons¶
All modules and libraries imported. csv containing raw data from ProFootballReference.com is also imported and read by the pandas read function. This is defined as df.
xxxxxxxxxximport pandas as pdimport matplotlib.pyplot as pltimport numpy as npfrom sklearn import linear_modelfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import mean_squared_error, r2_scoreimport statsmodels.api as smfrom statsmodels.formula.api import olsfrom statsmodels.stats.anova import anova_lmfrom statsmodels.graphics.factorplots import interaction_plotfrom scipy import statsimport matplotlib.colors as mcolorsfrom scipy.stats import rankdataimport seaborn as snsdf = pd.read_csv (r'C:\Users\Rob\Documents\dstats.csv')print (df) Tm G Cmp Att Cmp% Yds TD TD% Int PD ... \
0 Arizona Cardinals 17 367 561 65.4 3645 30 5.3 13 73 ...
1 Atlanta Falcons 17 391 577 67.8 3952 31 5.4 12 77 ...
2 Baltimore Ravens 17 397 621 63.9 4742 31 5.0 9 72 ...
3 Buffalo Bills 17 297 530 56.0 2771 12 2.3 19 80 ...
4 Carolina Panthers 17 337 515 65.4 3266 26 5.0 9 52 ...
.. ... .. ... ... ... ... .. ... ... .. ...
91 San Francisco 49ers19 16 318 519 61.3 2707 23 4.4 12 75 ...
92 Seattle Seahawks19 16 383 598 64.0 4223 19 3.2 16 74 ...
93 Tampa Bay Buccaneers19 16 408 664 61.4 4322 30 4.5 12 96 ...
94 Tennessee Titans19 16 386 598 64.5 4080 25 4.2 14 72 ...
95 Washington Redskins19 16 371 540 68.7 3823 35 6.5 13 52 ...
Hrry Hrry% QBKD QBKD% aSk Prss Prss% MTkl PA PAA
0 61 9.80% 60 10.70% 41 162 25.90% 110 366 384.1
1 48 7.60% 39 6.80% 18 105 16.70% 120 459 384.1
2 58 8.60% 62 10.00% 34 154 23.00% 115 392 384.1
3 93 15.40% 51 9.60% 42 186 30.80% 118 289 384.1
4 62 10.90% 48 9.30% 39 149 26.10% 106 404 384.1
.. ... ... ... ... ... ... ... ... ... ...
91 88 14.70% 36 6.90% 48 172 28.70% 107 310 384.1
92 60 9.20% 38 6.40% 28 126 19.30% 131 398 384.1
93 62 8.50% 66 9.90% 47 175 23.90% 118 449 384.1
94 72 10.70% 27 4.50% 43 142 21.10% 110 331 384.1
95 83 13.60% 45 8.30% 46 174 28.50% 116 435 384.1
[96 rows x 49 columns]
xxxxxxxxxxFinding the mean of all points allowed and added back to the df to use as the baseline laterFinding the mean of all points allowed and added back to the df to use as the baseline later
np.mean(df['PA'])384.1041666666667
xxxxxxxxxxChecking that all the cells in the dataframe is filledChecking that all the cells in the dataframe is filled
df.describe()| G | Cmp | Att | Cmp% | Yds | TD | TD% | Int | PD | Int% | ... | Air | aYAC | Bltz | Hrry | QBKD | aSk | Prss | MTkl | PA | PAA | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 96.000000 | 96.000000 | 96.000000 | 96.000000 | 96.000000 | 96.00000 | 96.000000 | 96.000000 | 96.000000 | 96.000000 | ... | 96.000000 | 96.000000 | 96.000000 | 96.000000 | 96.000000 | 96.000000 | 96.000000 | 96.000000 | 96.000000 | 9.600000e+01 |
| mean | 16.333333 | 366.750000 | 568.572917 | 64.473958 | 3827.718750 | 26.12500 | 4.606250 | 12.968750 | 68.968750 | 2.288542 | ... | 2294.708333 | 1933.708333 | 178.468750 | 62.020833 | 49.697917 | 38.072917 | 149.791667 | 107.989583 | 384.104167 | 3.841000e+02 |
| std | 0.473879 | 33.964688 | 41.337060 | 3.149812 | 428.987614 | 5.55783 | 0.991417 | 4.027414 | 11.484042 | 0.720507 | ... | 307.854850 | 276.118202 | 50.766328 | 13.440692 | 11.606822 | 9.128607 | 23.402167 | 16.409557 | 59.334463 | 7.428436e-13 |
| min | 16.000000 | 297.000000 | 464.000000 | 56.000000 | 2707.000000 | 12.00000 | 2.300000 | 3.000000 | 43.000000 | 0.600000 | ... | 1707.000000 | 1354.000000 | 72.000000 | 35.000000 | 27.000000 | 17.000000 | 95.000000 | 66.000000 | 225.000000 | 3.841000e+02 |
| 25% | 16.000000 | 343.750000 | 541.000000 | 62.675000 | 3577.500000 | 22.00000 | 3.900000 | 10.000000 | 60.750000 | 1.800000 | ... | 2108.000000 | 1734.250000 | 143.750000 | 52.000000 | 41.000000 | 31.750000 | 135.500000 | 96.750000 | 351.000000 | 3.841000e+02 |
| 50% | 16.000000 | 368.000000 | 562.000000 | 64.050000 | 3822.000000 | 26.00000 | 4.600000 | 12.500000 | 70.000000 | 2.250000 | ... | 2258.000000 | 1906.000000 | 168.500000 | 59.500000 | 50.000000 | 39.000000 | 148.000000 | 109.000000 | 373.000000 | 3.841000e+02 |
| 75% | 17.000000 | 390.250000 | 598.000000 | 66.650000 | 4123.000000 | 30.00000 | 5.225000 | 15.000000 | 77.250000 | 2.625000 | ... | 2487.250000 | 2071.500000 | 206.750000 | 71.000000 | 56.000000 | 46.000000 | 165.250000 | 119.000000 | 426.000000 | 3.841000e+02 |
| max | 17.000000 | 450.000000 | 680.000000 | 70.700000 | 4742.000000 | 39.00000 | 7.200000 | 26.000000 | 96.000000 | 4.700000 | ... | 3118.000000 | 2793.000000 | 329.000000 | 95.000000 | 80.000000 | 56.000000 | 219.000000 | 143.000000 | 519.000000 | 3.841000e+02 |
8 rows × 44 columns
xxxxxxxxxxOne-hot Encoding to turn the Categorical data into numbers to allow better analysis through the machine learningOne-hot Encoding to turn the Categorical data into numbers to allow better analysis through the machine learning
x
df = pd.get_dummies(df)df.iloc[:,5:].head(5)| TD | TD% | Int | PD | Int% | Y/A | AY/A | Y/C | Y/G | Rate | ... | Prss%_26.80% | Prss%_27.50% | Prss%_27.60% | Prss%_27.90% | Prss%_28.50% | Prss%_28.60% | Prss%_28.70% | Prss%_30.50% | Prss%_30.80% | Prss%_35.10% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 30 | 5.3 | 13 | 73 | 2.3 | 6.9 | 6.9 | 10.6 | 214.4 | 93.5 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 31 | 5.4 | 12 | 77 | 2.1 | 7.1 | 7.3 | 10.5 | 232.5 | 97.4 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 31 | 5.0 | 9 | 72 | 1.4 | 8.0 | 8.4 | 12.6 | 278.9 | 99.4 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 12 | 2.3 | 19 | 80 | 3.6 | 5.7 | 4.6 | 10.2 | 163.0 | 65.3 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 4 | 26 | 5.0 | 9 | 52 | 1.7 | 6.9 | 7.1 | 10.6 | 192.1 | 95.0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 396 columns
xxxxxxxxxx### Choose what is to be predictedDefine the desired variable to model for (predict) as labels. Then remove that column from the dataframe and return the dataframe as an array.Choose what is to be predicted¶
Define the desired variable to model for (predict) as labels. Then remove that column from the dataframe and return the dataframe as an array.
x
labels = np.array(df['PA'])df= df.drop('PA', axis = 1)df_list = list(df.columns)df = np.array(df)x
Import the train_test_split function from sklearnImport the train_test_split function from sklearn
xxxxxxxxxxfrom sklearn.model_selection import train_test_splitxxxxxxxxxxCreate the training and testing splits. Also add the size of the testing data.Create the training and testing splits. Also add the size of the testing data.
x
train_df, test_df, train_labels, test_labels = train_test_split(df, labels, test_size = 0.25, random_state = 42)x
Print the shapes of the splits created above to make sure there are no errors in themPrint the shapes of the splits created above to make sure there are no errors in them
xxxxxxxxxxprint('Training df Shape:', train_df.shape)print('Training Labels Shape:', train_labels.shape)print('Testing df Shape:', test_df.shape)print('Testing Labels Shape:', test_labels.shape)Training df Shape: (72, 400) Training Labels Shape: (72,) Testing df Shape: (24, 400) Testing Labels Shape: (24,)
xxxxxxxxxx### BaselineUse the PAA (Points Against Average) created earlier as the baseline prediction. Test for errors in the baseline compared to the testing dataBaseline¶
Use the PAA (Points Against Average) created earlier as the baseline prediction. Test for errors in the baseline compared to the testing data
x
baseline_preds = test_df[:, df_list.index('PAA')]baseline_errors = abs(baseline_preds - test_labels)print('Average baseline error: ', round(np.mean(baseline_errors), 2))Average baseline error: 53.93
xxxxxxxxxx### Random Forest RegressionImport the RandomForestRegression from Sklearn. Then call for the number of decision trees required, in this case 1000. Now train the model to the training set created.Random Forest Regression¶
Import the RandomForestRegression from Sklearn. Then call for the number of decision trees required, in this case 1000. Now train the model to the training set created.
x
from sklearn.ensemble import RandomForestRegressorrf = RandomForestRegressor(n_estimators = 1000, random_state = 42)rf.fit(train_df, train_labels);x
Now make the predictions based on the testing data and calculate the mean absolute errorNow make the predictions based on the testing data and calculate the mean absolute error
x
predictions = rf.predict(test_df)errors = abs(predictions - test_labels)print('Mean Absolute Error:', round(np.mean(errors), 2), 'degrees.')Mean Absolute Error: 27.58 degrees.
xxxxxxxxxxNow find the accuracy of the predictions by calculating the mean absolute error percentage and subtracting the mean of the mape from 100. Now find the accuracy of the predictions by calculating the mean absolute error percentage and subtracting the mean of the mape from 100.
x
mape = 100 * (errors / test_labels)accuracy = 100 - np.mean(mape)print('Accuracy:', round(accuracy, 2), '%.')Accuracy: 92.56 %.
xxxxxxxxxx### List of ImportancesMake a list of the variables and their importance in the prediction. The list is then sorted from most important to least. List of Importances¶
Make a list of the variables and their importance in the prediction. The list is then sorted from most important to least.
x
importances = list(rf.feature_importances_)feature_importances = [(df, round(importance, 2)) for df, importance in zip(df_list, importances)]feature_importances = sorted(feature_importances, key = lambda x: x[1], reverse = True)[print('Variable: {:20} Importance: {}'.format(*pair)) for pair in feature_importances];Variable: ANY/A Importance: 0.24 Variable: Rate Importance: 0.19 Variable: EXP Importance: 0.09 Variable: AY/A Importance: 0.06 Variable: NY/A Importance: 0.04 Variable: TD Importance: 0.03 Variable: Attr Importance: 0.03 Variable: Ydsr Importance: 0.03 Variable: TDr Importance: 0.03 Variable: RY/G Importance: 0.03 Variable: aTD Importance: 0.03 Variable: Cmp Importance: 0.01 Variable: Cmp% Importance: 0.01 Variable: Yds Importance: 0.01 Variable: TD% Importance: 0.01 Variable: Y/G Importance: 0.01 Variable: Yds.1 Importance: 0.01 Variable: QBHits Importance: 0.01 Variable: REXP Importance: 0.01 Variable: aCmp Importance: 0.01 Variable: aYds Importance: 0.01 Variable: aDADOT Importance: 0.01 Variable: aYAC Importance: 0.01 Variable: Bltz Importance: 0.01 Variable: MTkl Importance: 0.01 Variable: G Importance: 0.0 Variable: Att Importance: 0.0 Variable: Int Importance: 0.0 Variable: PD Importance: 0.0 Variable: Int% Importance: 0.0 Variable: Y/A Importance: 0.0 Variable: Y/C Importance: 0.0 Variable: Sk Importance: 0.0 Variable: TFL Importance: 0.0 Variable: Sk% Importance: 0.0 Variable: RY/A Importance: 0.0 Variable: aAtt Importance: 0.0 Variable: Air Importance: 0.0 Variable: Hrry Importance: 0.0 Variable: QBKD Importance: 0.0 Variable: aSk Importance: 0.0 Variable: Prss Importance: 0.0 Variable: PAA Importance: 0.0 Variable: Tm_Arizona Cardinals Importance: 0.0 Variable: Tm_Arizona Cardinals19 Importance: 0.0 Variable: Tm_Arizona Cardinals20 Importance: 0.0 Variable: Tm_Atlanta Falcons Importance: 0.0 Variable: Tm_Atlanta Falcons19 Importance: 0.0 Variable: Tm_Atlanta Falcons20 Importance: 0.0 Variable: Tm_Baltimore Ravens Importance: 0.0 Variable: Tm_Baltimore Ravens19 Importance: 0.0 Variable: Tm_Baltimore Ravens20 Importance: 0.0 Variable: Tm_Buffalo Bills Importance: 0.0 Variable: Tm_Buffalo Bills19 Importance: 0.0 Variable: Tm_Buffalo Bills20 Importance: 0.0 Variable: Tm_Carolina Panthers Importance: 0.0 Variable: Tm_Carolina Panthers19 Importance: 0.0 Variable: Tm_Carolina Panthers20 Importance: 0.0 Variable: Tm_Chicago Bears Importance: 0.0 Variable: Tm_Chicago Bears19 Importance: 0.0 Variable: Tm_Chicago Bears20 Importance: 0.0 Variable: Tm_Cincinnati Bengals Importance: 0.0 Variable: Tm_Cincinnati Bengals19 Importance: 0.0 Variable: Tm_Cincinnati Bengals20 Importance: 0.0 Variable: Tm_Cleveland Browns Importance: 0.0 Variable: Tm_Cleveland Browns19 Importance: 0.0 Variable: Tm_Cleveland Browns20 Importance: 0.0 Variable: Tm_Dallas Cowboys Importance: 0.0 Variable: Tm_Dallas Cowboys19 Importance: 0.0 Variable: Tm_Dallas Cowboys20 Importance: 0.0 Variable: Tm_Denver Broncos Importance: 0.0 Variable: Tm_Denver Broncos19 Importance: 0.0 Variable: Tm_Denver Broncos20 Importance: 0.0 Variable: Tm_Detroit Lions Importance: 0.0 Variable: Tm_Detroit Lions19 Importance: 0.0 Variable: Tm_Detroit Lions20 Importance: 0.0 Variable: Tm_Green Bay Packers Importance: 0.0 Variable: Tm_Green Bay Packers19 Importance: 0.0 Variable: Tm_Green Bay Packers20 Importance: 0.0 Variable: Tm_Houston Texans Importance: 0.0 Variable: Tm_Houston Texans19 Importance: 0.0 Variable: Tm_Houston Texans20 Importance: 0.0 Variable: Tm_Indianapolis Colts Importance: 0.0 Variable: Tm_Indianapolis Colts19 Importance: 0.0 Variable: Tm_Indianapolis Colts20 Importance: 0.0 Variable: Tm_Jacksonville Jaguars Importance: 0.0 Variable: Tm_Jacksonville Jaguars19 Importance: 0.0 Variable: Tm_Jacksonville Jaguars20 Importance: 0.0 Variable: Tm_Kansas City Chiefs Importance: 0.0 Variable: Tm_Kansas City Chiefs19 Importance: 0.0 Variable: Tm_Kansas City Chiefs20 Importance: 0.0 Variable: Tm_Las Vegas Raiders Importance: 0.0 Variable: Tm_Las Vegas Raiders20 Importance: 0.0 Variable: Tm_Los Angeles Chargers Importance: 0.0 Variable: Tm_Los Angeles Chargers19 Importance: 0.0 Variable: Tm_Los Angeles Chargers20 Importance: 0.0 Variable: Tm_Los Angeles Rams Importance: 0.0 Variable: Tm_Los Angeles Rams19 Importance: 0.0 Variable: Tm_Los Angeles Rams20 Importance: 0.0 Variable: Tm_Miami Dolphins Importance: 0.0 Variable: Tm_Miami Dolphins19 Importance: 0.0 Variable: Tm_Miami Dolphins20 Importance: 0.0 Variable: Tm_Minnesota Vikings Importance: 0.0 Variable: Tm_Minnesota Vikings19 Importance: 0.0 Variable: Tm_Minnesota Vikings20 Importance: 0.0 Variable: Tm_New England Patriots Importance: 0.0 Variable: Tm_New England Patriots19 Importance: 0.0 Variable: Tm_New England Patriots20 Importance: 0.0 Variable: Tm_New Orleans Saints Importance: 0.0 Variable: Tm_New Orleans Saints19 Importance: 0.0 Variable: Tm_New Orleans Saints20 Importance: 0.0 Variable: Tm_New York Giants Importance: 0.0 Variable: Tm_New York Giants19 Importance: 0.0 Variable: Tm_New York Giants20 Importance: 0.0 Variable: Tm_New York Jets Importance: 0.0 Variable: Tm_New York Jets19 Importance: 0.0 Variable: Tm_New York Jets20 Importance: 0.0 Variable: Tm_Oakland Raiders19 Importance: 0.0 Variable: Tm_Philadelphia Eagles Importance: 0.0 Variable: Tm_Philadelphia Eagles19 Importance: 0.0 Variable: Tm_Philadelphia Eagles20 Importance: 0.0 Variable: Tm_Pittsburgh Steelers Importance: 0.0 Variable: Tm_Pittsburgh Steelers19 Importance: 0.0 Variable: Tm_Pittsburgh Steelers20 Importance: 0.0 Variable: Tm_San Francisco 49ers Importance: 0.0 Variable: Tm_San Francisco 49ers19 Importance: 0.0 Variable: Tm_San Francisco 49ers20 Importance: 0.0 Variable: Tm_Seattle Seahawks Importance: 0.0 Variable: Tm_Seattle Seahawks19 Importance: 0.0 Variable: Tm_Seattle Seahawks20 Importance: 0.0 Variable: Tm_Tampa Bay Buccaneers Importance: 0.0 Variable: Tm_Tampa Bay Buccaneers19 Importance: 0.0 Variable: Tm_Tampa Bay Buccaneers20 Importance: 0.0 Variable: Tm_Tennessee Titans Importance: 0.0 Variable: Tm_Tennessee Titans19 Importance: 0.0 Variable: Tm_Tennessee Titans20 Importance: 0.0 Variable: Tm_Washington Football Team Importance: 0.0 Variable: Tm_Washington Football Team20 Importance: 0.0 Variable: Tm_Washington Redskins19 Importance: 0.0 Variable: Bltz%_12.10% Importance: 0.0 Variable: Bltz%_13.70% Importance: 0.0 Variable: Bltz%_16.30% Importance: 0.0 Variable: Bltz%_16.40% Importance: 0.0 Variable: Bltz%_17.10% Importance: 0.0 Variable: Bltz%_17.50% Importance: 0.0 Variable: Bltz%_18.00% Importance: 0.0 Variable: Bltz%_19.50% Importance: 0.0 Variable: Bltz%_19.80% Importance: 0.0 Variable: Bltz%_19.90% Importance: 0.0 Variable: Bltz%_20.20% Importance: 0.0 Variable: Bltz%_20.50% Importance: 0.0 Variable: Bltz%_20.90% Importance: 0.0 Variable: Bltz%_21.30% Importance: 0.0 Variable: Bltz%_21.40% Importance: 0.0 Variable: Bltz%_22.00% Importance: 0.0 Variable: Bltz%_22.10% Importance: 0.0 Variable: Bltz%_22.20% Importance: 0.0 Variable: Bltz%_22.40% Importance: 0.0 Variable: Bltz%_22.60% Importance: 0.0 Variable: Bltz%_22.70% Importance: 0.0 Variable: Bltz%_22.80% Importance: 0.0 Variable: Bltz%_22.90% Importance: 0.0 Variable: Bltz%_23.20% Importance: 0.0 Variable: Bltz%_23.30% Importance: 0.0 Variable: Bltz%_23.50% Importance: 0.0 Variable: Bltz%_23.70% Importance: 0.0 Variable: Bltz%_23.90% Importance: 0.0 Variable: Bltz%_24.00% Importance: 0.0 Variable: Bltz%_24.10% Importance: 0.0 Variable: Bltz%_24.40% Importance: 0.0 Variable: Bltz%_24.50% Importance: 0.0 Variable: Bltz%_24.60% Importance: 0.0 Variable: Bltz%_24.70% Importance: 0.0 Variable: Bltz%_24.80% Importance: 0.0 Variable: Bltz%_24.90% Importance: 0.0 Variable: Bltz%_25.00% Importance: 0.0 Variable: Bltz%_25.10% Importance: 0.0 Variable: Bltz%_25.30% Importance: 0.0 Variable: Bltz%_26.00% Importance: 0.0 Variable: Bltz%_26.10% Importance: 0.0 Variable: Bltz%_26.60% Importance: 0.0 Variable: Bltz%_26.80% Importance: 0.0 Variable: Bltz%_26.90% Importance: 0.0 Variable: Bltz%_27.10% Importance: 0.0 Variable: Bltz%_27.30% Importance: 0.0 Variable: Bltz%_27.90% Importance: 0.0 Variable: Bltz%_28.00% Importance: 0.0 Variable: Bltz%_28.10% Importance: 0.0 Variable: Bltz%_28.50% Importance: 0.0 Variable: Bltz%_28.70% Importance: 0.0 Variable: Bltz%_29.10% Importance: 0.0 Variable: Bltz%_29.80% Importance: 0.0 Variable: Bltz%_31.00% Importance: 0.0 Variable: Bltz%_31.10% Importance: 0.0 Variable: Bltz%_31.50% Importance: 0.0 Variable: Bltz%_31.60% Importance: 0.0 Variable: Bltz%_31.80% Importance: 0.0 Variable: Bltz%_32.40% Importance: 0.0 Variable: Bltz%_32.50% Importance: 0.0 Variable: Bltz%_32.70% Importance: 0.0 Variable: Bltz%_32.90% Importance: 0.0 Variable: Bltz%_33.50% Importance: 0.0 Variable: Bltz%_33.60% Importance: 0.0 Variable: Bltz%_33.70% Importance: 0.0 Variable: Bltz%_35.70% Importance: 0.0 Variable: Bltz%_35.80% Importance: 0.0 Variable: Bltz%_35.90% Importance: 0.0 Variable: Bltz%_36.90% Importance: 0.0 Variable: Bltz%_37.10% Importance: 0.0 Variable: Bltz%_38.20% Importance: 0.0 Variable: Bltz%_38.40% Importance: 0.0 Variable: Bltz%_39.00% Importance: 0.0 Variable: Bltz%_39.20% Importance: 0.0 Variable: Bltz%_39.40% Importance: 0.0 Variable: Bltz%_39.60% Importance: 0.0 Variable: Bltz%_39.70% Importance: 0.0 Variable: Bltz%_40.30% Importance: 0.0 Variable: Bltz%_40.80% Importance: 0.0 Variable: Bltz%_43.40% Importance: 0.0 Variable: Bltz%_44.10% Importance: 0.0 Variable: Bltz%_54.90% Importance: 0.0 Variable: Hrry%_10.00% Importance: 0.0 Variable: Hrry%_10.20% Importance: 0.0 Variable: Hrry%_10.40% Importance: 0.0 Variable: Hrry%_10.50% Importance: 0.0 Variable: Hrry%_10.60% Importance: 0.0 Variable: Hrry%_10.70% Importance: 0.0 Variable: Hrry%_10.80% Importance: 0.0 Variable: Hrry%_10.90% Importance: 0.0 Variable: Hrry%_11.10% Importance: 0.0 Variable: Hrry%_11.20% Importance: 0.0 Variable: Hrry%_11.30% Importance: 0.0 Variable: Hrry%_11.50% Importance: 0.0 Variable: Hrry%_11.60% Importance: 0.0 Variable: Hrry%_11.80% Importance: 0.0 Variable: Hrry%_12.10% Importance: 0.0 Variable: Hrry%_12.20% Importance: 0.0 Variable: Hrry%_12.40% Importance: 0.0 Variable: Hrry%_12.50% Importance: 0.0 Variable: Hrry%_12.70% Importance: 0.0 Variable: Hrry%_12.90% Importance: 0.0 Variable: Hrry%_13.00% Importance: 0.0 Variable: Hrry%_13.10% Importance: 0.0 Variable: Hrry%_13.60% Importance: 0.0 Variable: Hrry%_14.30% Importance: 0.0 Variable: Hrry%_14.50% Importance: 0.0 Variable: Hrry%_14.70% Importance: 0.0 Variable: Hrry%_15.40% Importance: 0.0 Variable: Hrry%_5.60% Importance: 0.0 Variable: Hrry%_6.10% Importance: 0.0 Variable: Hrry%_6.70% Importance: 0.0 Variable: Hrry%_6.80% Importance: 0.0 Variable: Hrry%_7.10% Importance: 0.0 Variable: Hrry%_7.20% Importance: 0.0 Variable: Hrry%_7.30% Importance: 0.0 Variable: Hrry%_7.50% Importance: 0.0 Variable: Hrry%_7.60% Importance: 0.0 Variable: Hrry%_7.70% Importance: 0.0 Variable: Hrry%_7.80% Importance: 0.0 Variable: Hrry%_7.90% Importance: 0.0 Variable: Hrry%_8.00% Importance: 0.0 Variable: Hrry%_8.10% Importance: 0.0 Variable: Hrry%_8.20% Importance: 0.0 Variable: Hrry%_8.30% Importance: 0.0 Variable: Hrry%_8.40% Importance: 0.0 Variable: Hrry%_8.50% Importance: 0.0 Variable: Hrry%_8.60% Importance: 0.0 Variable: Hrry%_8.70% Importance: 0.0 Variable: Hrry%_8.80% Importance: 0.0 Variable: Hrry%_8.90% Importance: 0.0 Variable: Hrry%_9.00% Importance: 0.0 Variable: Hrry%_9.10% Importance: 0.0 Variable: Hrry%_9.20% Importance: 0.0 Variable: Hrry%_9.30% Importance: 0.0 Variable: Hrry%_9.40% Importance: 0.0 Variable: Hrry%_9.60% Importance: 0.0 Variable: Hrry%_9.70% Importance: 0.0 Variable: Hrry%_9.80% Importance: 0.0 Variable: QBKD%_10.00% Importance: 0.0 Variable: QBKD%_10.10% Importance: 0.0 Variable: QBKD%_10.20% Importance: 0.0 Variable: QBKD%_10.70% Importance: 0.0 Variable: QBKD%_10.80% Importance: 0.0 Variable: QBKD%_10.90% Importance: 0.0 Variable: QBKD%_11.00% Importance: 0.0 Variable: QBKD%_11.10% Importance: 0.0 Variable: QBKD%_11.20% Importance: 0.0 Variable: QBKD%_11.30% Importance: 0.0 Variable: QBKD%_11.70% Importance: 0.0 Variable: QBKD%_11.80% Importance: 0.0 Variable: QBKD%_12.00% Importance: 0.0 Variable: QBKD%_12.70% Importance: 0.0 Variable: QBKD%_12.90% Importance: 0.0 Variable: QBKD%_15.20% Importance: 0.0 Variable: QBKD%_4.50% Importance: 0.0 Variable: QBKD%_5.10% Importance: 0.0 Variable: QBKD%_5.30% Importance: 0.0 Variable: QBKD%_5.40% Importance: 0.0 Variable: QBKD%_5.90% Importance: 0.0 Variable: QBKD%_6.00% Importance: 0.0 Variable: QBKD%_6.20% Importance: 0.0 Variable: QBKD%_6.30% Importance: 0.0 Variable: QBKD%_6.40% Importance: 0.0 Variable: QBKD%_6.50% Importance: 0.0 Variable: QBKD%_6.60% Importance: 0.0 Variable: QBKD%_6.70% Importance: 0.0 Variable: QBKD%_6.80% Importance: 0.0 Variable: QBKD%_6.90% Importance: 0.0 Variable: QBKD%_7.00% Importance: 0.0 Variable: QBKD%_7.10% Importance: 0.0 Variable: QBKD%_7.20% Importance: 0.0 Variable: QBKD%_7.30% Importance: 0.0 Variable: QBKD%_7.40% Importance: 0.0 Variable: QBKD%_7.50% Importance: 0.0 Variable: QBKD%_7.60% Importance: 0.0 Variable: QBKD%_7.80% Importance: 0.0 Variable: QBKD%_7.90% Importance: 0.0 Variable: QBKD%_8.00% Importance: 0.0 Variable: QBKD%_8.10% Importance: 0.0 Variable: QBKD%_8.30% Importance: 0.0 Variable: QBKD%_8.40% Importance: 0.0 Variable: QBKD%_8.50% Importance: 0.0 Variable: QBKD%_8.60% Importance: 0.0 Variable: QBKD%_8.70% Importance: 0.0 Variable: QBKD%_8.80% Importance: 0.0 Variable: QBKD%_8.90% Importance: 0.0 Variable: QBKD%_9.00% Importance: 0.0 Variable: QBKD%_9.20% Importance: 0.0 Variable: QBKD%_9.30% Importance: 0.0 Variable: QBKD%_9.50% Importance: 0.0 Variable: QBKD%_9.60% Importance: 0.0 Variable: QBKD%_9.70% Importance: 0.0 Variable: QBKD%_9.80% Importance: 0.0 Variable: QBKD%_9.90% Importance: 0.0 Variable: Prss%_16.50% Importance: 0.0 Variable: Prss%_16.70% Importance: 0.0 Variable: Prss%_17.50% Importance: 0.0 Variable: Prss%_17.60% Importance: 0.0 Variable: Prss%_18.10% Importance: 0.0 Variable: Prss%_18.40% Importance: 0.0 Variable: Prss%_18.80% Importance: 0.0 Variable: Prss%_19.00% Importance: 0.0 Variable: Prss%_19.30% Importance: 0.0 Variable: Prss%_19.60% Importance: 0.0 Variable: Prss%_19.90% Importance: 0.0 Variable: Prss%_20.10% Importance: 0.0 Variable: Prss%_20.20% Importance: 0.0 Variable: Prss%_20.50% Importance: 0.0 Variable: Prss%_20.70% Importance: 0.0 Variable: Prss%_21.10% Importance: 0.0 Variable: Prss%_21.30% Importance: 0.0 Variable: Prss%_21.40% Importance: 0.0 Variable: Prss%_21.50% Importance: 0.0 Variable: Prss%_21.70% Importance: 0.0 Variable: Prss%_21.80% Importance: 0.0 Variable: Prss%_21.90% Importance: 0.0 Variable: Prss%_22.10% Importance: 0.0 Variable: Prss%_22.20% Importance: 0.0 Variable: Prss%_22.40% Importance: 0.0 Variable: Prss%_22.60% Importance: 0.0 Variable: Prss%_22.80% Importance: 0.0 Variable: Prss%_22.90% Importance: 0.0 Variable: Prss%_23.00% Importance: 0.0 Variable: Prss%_23.10% Importance: 0.0 Variable: Prss%_23.30% Importance: 0.0 Variable: Prss%_23.40% Importance: 0.0 Variable: Prss%_23.50% Importance: 0.0 Variable: Prss%_23.60% Importance: 0.0 Variable: Prss%_23.70% Importance: 0.0 Variable: Prss%_23.80% Importance: 0.0 Variable: Prss%_23.90% Importance: 0.0 Variable: Prss%_24.00% Importance: 0.0 Variable: Prss%_24.10% Importance: 0.0 Variable: Prss%_24.20% Importance: 0.0 Variable: Prss%_24.30% Importance: 0.0 Variable: Prss%_24.40% Importance: 0.0 Variable: Prss%_24.50% Importance: 0.0 Variable: Prss%_24.60% Importance: 0.0 Variable: Prss%_24.70% Importance: 0.0 Variable: Prss%_24.80% Importance: 0.0 Variable: Prss%_25.00% Importance: 0.0 Variable: Prss%_25.20% Importance: 0.0 Variable: Prss%_25.50% Importance: 0.0 Variable: Prss%_25.60% Importance: 0.0 Variable: Prss%_25.80% Importance: 0.0 Variable: Prss%_25.90% Importance: 0.0 Variable: Prss%_26.10% Importance: 0.0 Variable: Prss%_26.20% Importance: 0.0 Variable: Prss%_26.30% Importance: 0.0 Variable: Prss%_26.40% Importance: 0.0 Variable: Prss%_26.80% Importance: 0.0 Variable: Prss%_27.50% Importance: 0.0 Variable: Prss%_27.60% Importance: 0.0 Variable: Prss%_27.90% Importance: 0.0 Variable: Prss%_28.50% Importance: 0.0 Variable: Prss%_28.60% Importance: 0.0 Variable: Prss%_28.70% Importance: 0.0 Variable: Prss%_30.50% Importance: 0.0 Variable: Prss%_30.80% Importance: 0.0 Variable: Prss%_35.10% Importance: 0.0
xxxxxxxxxxType Markdown and LaTeX:
xxxxxxxxxx# Random Forest Regression of the defence of NFL teams through the 2018-21 seasons All modules and libraries imported. csv containing raw data from ProFootballReference.com is also imported and read by the pandas read function. This is definedas df.Random Forest Regression of the defence of NFL teams through the 2018-21 seasons¶
All modules and libraries imported. csv containing raw data from ProFootballReference.com is also imported and read by the pandas read function. This is defined as df.
xxxxxxxxxximport pandas as pdimport matplotlib.pyplot as pltimport numpy as npfrom sklearn import linear_modelfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import mean_squared_error, r2_scoreimport statsmodels.api as smfrom statsmodels.formula.api import olsfrom statsmodels.stats.anova import anova_lmfrom statsmodels.graphics.factorplots import interaction_plotfrom scipy import statsimport matplotlib.colors as mcolorsfrom scipy.stats import rankdataimport seaborn as snsdf = pd.read_csv (r'C:\Users\Rob\Documents\Prediction.csv')print (df)Teams G Pts Prediction Prediction/G Pts/G P/G*17 0 Arizona Cardinals 7 156 419.10 24.65 22.3 378.86 1 Atlanta Falcons 7 163 354.03 20.83 23.3 395.86 2 Baltimore Ravens 7 181 417.33 24.55 25.9 439.57 3 Buffalo Bills 6 176 507.74 29.87 29.3 498.67 4 Carolina Panthers 7 124 257.84 15.17 17.7 301.14 5 Chicago Bears 6 93 276.56 16.27 15.5 263.50 6 Cincinnati Bengals 7 173 433.40 25.49 24.7 420.14 7 Cleveland Browns 7 168 464.37 27.32 24.0 408.00 8 Dallas Cowboys 7 134 319.45 18.79 19.1 325.43 9 Denver Broncos 7 100 339.80 19.99 14.3 242.86 10 Detroit Lions 6 146 347.52 20.44 24.3 413.67 11 Green Bay Packers 7 128 375.83 22.11 18.3 310.86 12 Houston Texans 6 106 346.00 20.35 17.7 300.33 13 Indianapolis Colts 7 113 397.78 23.40 16.1 274.43 14 Jacksonville Jaguars 7 155 417.01 24.53 22.1 376.43 15 Kansas City Chiefs 7 223 462.52 27.21 31.9 541.57 16 Las Vegas Raiders 6 163 443.20 26.07 27.2 461.83 17 Los Angeles Chargers 7 164 429.25 25.25 23.4 398.29 18 Los Angeles Rams 6 104 336.85 19.81 17.3 294.67 19 Miami Dolphins 7 147 420.28 24.72 21.0 357.00 20 Minnesota Vikings 6 139 389.33 22.90 23.2 393.83 21 New England Patriots 6 141 404.85 23.81 23.5 399.50 22 New Orleans Saints 7 175 465.84 27.40 25.0 425.00 23 New York Giants 7 150 391.79 23.05 21.4 364.29 24 New York Jets 7 159 421.97 24.82 22.7 386.14 25 Philadelphia Eagles 6 161 432.22 25.42 26.8 456.17 26 Pittsburgh Steelers 7 107 318.67 18.75 15.3 259.86 27 San Francisco 49ers 7 145 412.43 24.26 20.7 352.14 28 Seattle Seahawks 7 183 396.31 23.31 26.1 444.43 29 Tampa Bay Buccaneers 7 124 383.57 22.56 17.7 301.14 30 Tennessee Titans 6 115 275.78 16.22 19.2 279.29 31 Washington Commanders 7 125 372.02 21.88 17.9 303.57
xxxxxxxxxx### Scatter GraphScatter points scored per game and predicted points scored per gameScatter Graph¶
Scatter points scored per game and predicted points scored per game
plt.scatter(df['Pts/G'], df['Prediction/G'], c = df['Prediction/G'], cmap = 'Reds')plt.xlabel('Pts/G')plt.show()xxxxxxxxxxDefine the columns to make the scripting quicker.Define the columns to make the scripting quicker.
pg17 = df['P/G*17']pg17 = np.array(pg17).reshape((-1,1))pre = df['Prediction']preg = df['Prediction/G']preg = np.array(preg).reshape(-1,1)pg = df['Pts/G']xxxxxxxxxx### Linear RegressionCreate and perform a linear regression on the per game dataLinear Regression¶
Create and perform a linear regression on the per game data
model = LinearRegression()model = LinearRegression().fit(preg, pg)r_sq = model.score(preg, pg)print(f"coefficient of determination: {r_sq}")print(f"intercept: {model.intercept_}")print(f"slope: {model.coef_}")coefficient of determination: 0.5473165407302725 intercept: 0.7686812178132065 slope: [0.91671526]
yr_pro = df['Pts/G']*17preg = df['Prediction']/17xxxxxxxxxxProjection and Prediction Comparison¶
First display the difference between the prediction and the projected points allowed. In this model it is appropriate to do this for both full year and per game data to test that they share the same correlation coefficient. The per game data is more valuable as it makes it easier to identify how diferent the predictions and projections are as points in the NFL are often 3 or 7. the numbers 2 ,6 and 8 can also occur but are less common and situational which cant be controlled for by this model. The sum of all and squared differences should be provided also.
print("difference:", pre - yr_pro)print("SAD:", np.sum(np.abs(pre - yr_pro)))print("SSD:", np.sum(np.square(pre - yr_pro)))print("correlation:", np.corrcoef(np.array((pre, yr_pro)))[0, 1])difference: 0 40.00 1 -42.07 2 -22.97 3 9.64 4 -43.06 5 13.06 6 13.50 7 56.37 8 -5.25 9 96.70 10 -65.58 11 64.73 12 45.10 13 124.08 14 41.31 15 -79.78 16 -19.20 17 31.45 18 42.75 19 63.28 20 -5.07 21 5.35 22 40.84 23 27.99 24 36.07 25 -23.38 26 58.57 27 60.53 28 -47.39 29 82.67 30 -50.62 31 67.72 dtype: float64 SAD: 1426.0800000000002 SSD: 87611.2276 correlation: 0.7398088722661882
print("difference:", preg - pg)print("Sum All Differences:", np.sum(np.abs(preg - pg)))print("Sum Squared Differences:", np.sum(np.square(preg - pg)))print("correlation:", np.corrcoef(np.array((preg, pg)))[0, 1])difference: 0 2.352941 1 -2.474706 2 -1.351176 3 0.567059 4 -2.532941 5 0.768235 6 0.794118 7 3.315882 8 -0.308824 9 5.688235 10 -3.857647 11 3.807647 12 2.652941 13 7.298824 14 2.430000 15 -4.692941 16 -1.129412 17 1.850000 18 2.514706 19 3.722353 20 -0.298235 21 0.314706 22 2.402353 23 1.646471 24 2.121765 25 -1.375294 26 3.445294 27 3.560588 28 -2.787647 29 4.862941 30 -2.977647 31 3.983529 dtype: float64 Sum All Differences: 83.88705882352943 Sum Squared Differences: 303.1530366782007 correlation: 0.7398088722661882
xxxxxxxxxx### New DataframeMake a new dataframe consisting of the the teams and their corresponding prediction/game subtracted by the points/game so far. New Dataframe¶
Make a new dataframe consisting of the the teams and their corresponding prediction/game subtracted by the points/game so far.
Teams = ('Atlanta Falcons', 'Buffalo Bills', 'Carolina Panthers', 'Chicago Bears', 'Cincinnati Bengals', 'Cleveland Browns', 'Indianapolis Colts', 'Arizona Cardinals', 'Dallas Cowboys', 'Denver Broncos', 'Detroit Lions', 'Green Bay Packers', 'Houston Texans', 'Jacksonville Jaguars', 'Kansas City Chiefs', 'Miami Dolphins', 'Minnesota Vikings', 'New Orleans Saints', 'New England Patriots', 'New York Giants', 'New York Jets', 'Tennessee Titans', 'Philadelphia Eagles', 'Pittsburgh Steelers', 'Las Vegas Raiders', 'Los Angeles Rams', 'Baltimore Ravens', 'Los Angeles Chargers', 'Seattle Seahawks', 'San Francisco 49ers', 'Tampa Bay Buccaneers', 'Washington Commanders')df1 = pd.DataFrame(columns=['Teams'])df1['Teams'] = Teamsdiff = preg - pgdf1['P/G O Diff'] = diffdf1_sorted = df1.sort_values('Teams').reset_index(drop=True)df1_sorted| Teams | P/G O Diff | |
|---|---|---|
| 0 | Arizona Cardinals | 3.315882 |
| 1 | Atlanta Falcons | 2.352941 |
| 2 | Baltimore Ravens | 3.445294 |
| 3 | Buffalo Bills | -2.474706 |
| 4 | Carolina Panthers | -1.351176 |
| 5 | Chicago Bears | 0.567059 |
| 6 | Cincinnati Bengals | -2.532941 |
| 7 | Cleveland Browns | 0.768235 |
| 8 | Dallas Cowboys | -0.308824 |
| 9 | Denver Broncos | 5.688235 |
| 10 | Detroit Lions | -3.857647 |
| 11 | Green Bay Packers | 3.807647 |
| 12 | Houston Texans | 2.652941 |
| 13 | Indianapolis Colts | 0.794118 |
| 14 | Jacksonville Jaguars | 7.298824 |
| 15 | Kansas City Chiefs | 2.430000 |
| 16 | Las Vegas Raiders | 2.121765 |
| 17 | Los Angeles Chargers | 3.560588 |
| 18 | Los Angeles Rams | -1.375294 |
| 19 | Miami Dolphins | -4.692941 |
| 20 | Minnesota Vikings | -1.129412 |
| 21 | New England Patriots | 2.514706 |
| 22 | New Orleans Saints | 1.850000 |
| 23 | New York Giants | 3.722353 |
| 24 | New York Jets | -0.298235 |
| 25 | Philadelphia Eagles | 2.402353 |
| 26 | Pittsburgh Steelers | 1.646471 |
| 27 | San Francisco 49ers | 4.862941 |
| 28 | Seattle Seahawks | -2.787647 |
| 29 | Tampa Bay Buccaneers | -2.977647 |
| 30 | Tennessee Titans | 0.314706 |
| 31 | Washington Commanders | 3.983529 |
xxxxxxxxxx### Barchartplot a bar chart from the data in the new dataframe(team name and their prediction/game - points/game)Barchart¶
plot a bar chart from the data in the new dataframe(team name and their prediction/game - points/game)
plt.figure(figsize=(10,5))plt.bar(df1_sorted['Teams'], df1_sorted['P/G O Diff'],color = ['firebrick','darkred','purple','blue','cyan', 'orange', 'brown','darkorange', 'midnightblue', 'darkorange', 'cornflowerblue', 'forestgreen', 'darkblue', 'blue', 'turquoise', 'red','black','dodgerblue','blue','aqua', 'darkviolet','midnightblue','black', 'blue','darkgreen','lime','black','darkred','darkslategray','darkred','darkturquoise','maroon' ])plt.axhline(y = 0, color = 'black', linestyle = '-')plt.xticks(rotation=90)plt.show<function matplotlib.pyplot.show(close=None, block=None)>
xxxxxxxxxx### Import DataframeImport the dataframe showing the teams and their points allowed differential created previously. Import Dataframe¶
Import the dataframe showing the teams and their points allowed differential created previously.
df2 = pd.read_csv (r'C:\Users\Rob\Documents\DPredDiff.csv')df2| Teams | P/G Diff | |
|---|---|---|
| 0 | Arizona Cardinals | 1.346471 |
| 1 | Atlanta Falcons | -0.131765 |
| 2 | Baltimore Ravens | 1.464118 |
| 3 | Buffalo Bills | -0.903529 |
| 4 | Carolina Panthers | -1.268235 |
| 5 | Chicago Bears | -2.062941 |
| 6 | Cincinnati Bengals | -0.714706 |
| 7 | Cleveland Browns | 1.087059 |
| 8 | Dallas Cowboys | -3.148824 |
| 9 | Denver Broncos | -1.874706 |
| 10 | Detroit Lions | 4.165882 |
| 11 | Green Bay Packers | -3.542353 |
| 12 | Houston Texans | -1.117059 |
| 13 | Indianapolis Colts | -3.470000 |
| 14 | Jacksonville Jaguars | -1.446471 |
| 15 | Kansas City Chiefs | 0.705882 |
| 16 | Las Vegas Raiders | -0.735882 |
| 17 | Los Angeles Chargers | 3.228235 |
| 18 | Los Angeles Rams | 1.301765 |
| 19 | Miami Dolphins | -0.456471 |
| 20 | Minnesota Vikings | -4.195882 |
| 21 | New England Patriots | -0.597059 |
| 22 | New Orleans Saints | 3.064706 |
| 23 | New York Giants | -5.836471 |
| 24 | New York Jets | 1.105294 |
| 25 | Philadelphia Eagles | 2.311765 |
| 26 | Pittsburgh Steelers | -0.198235 |
| 27 | San Francisco 49ers | -0.297647 |
| 28 | Seattle Seahawks | 0.252941 |
| 29 | Tampa Bay Buccaneers | -2.858824 |
| 30 | Tennessee Titans | -2.528824 |
| 31 | Washington Commanders | -2.844706 |
xxxxxxxxxxCompile the data from both dataframesCompile the data from both dataframes
ddiff = df2['P/G Diff']df1_sorted['P/G D Diff'] = ddiffdf1_sorted| Teams | P/G O Diff | P/G D Diff | |
|---|---|---|---|
| 0 | Arizona Cardinals | -3.315882 | 1.346471 |
| 1 | Atlanta Falcons | -2.352941 | -0.131765 |
| 2 | Baltimore Ravens | -3.445294 | 1.464118 |
| 3 | Buffalo Bills | 2.474706 | -0.903529 |
| 4 | Carolina Panthers | 1.351176 | -1.268235 |
| 5 | Chicago Bears | -0.567059 | -2.062941 |
| 6 | Cincinnati Bengals | 2.532941 | -0.714706 |
| 7 | Cleveland Browns | -0.768235 | 1.087059 |
| 8 | Dallas Cowboys | 0.308824 | -3.148824 |
| 9 | Denver Broncos | -5.688235 | -1.874706 |
| 10 | Detroit Lions | 3.857647 | 4.165882 |
| 11 | Green Bay Packers | -3.807647 | -3.542353 |
| 12 | Houston Texans | -2.652941 | -1.117059 |
| 13 | Indianapolis Colts | -0.794118 | -3.470000 |
| 14 | Jacksonville Jaguars | -7.298824 | -1.446471 |
| 15 | Kansas City Chiefs | -2.430000 | 0.705882 |
| 16 | Las Vegas Raiders | -2.121765 | -0.735882 |
| 17 | Los Angeles Chargers | -3.560588 | 3.228235 |
| 18 | Los Angeles Rams | 1.375294 | 1.301765 |
| 19 | Miami Dolphins | 4.692941 | -0.456471 |
| 20 | Minnesota Vikings | 1.129412 | -4.195882 |
| 21 | New England Patriots | -2.514706 | -0.597059 |
| 22 | New Orleans Saints | -1.850000 | 3.064706 |
| 23 | New York Giants | -3.722353 | -5.836471 |
| 24 | New York Jets | 0.298235 | 1.105294 |
| 25 | Philadelphia Eagles | -2.402353 | 2.311765 |
| 26 | Pittsburgh Steelers | -1.646471 | -0.198235 |
| 27 | San Francisco 49ers | -4.862941 | -0.297647 |
| 28 | Seattle Seahawks | 2.787647 | 0.252941 |
| 29 | Tampa Bay Buccaneers | 2.977647 | -2.858824 |
| 30 | Tennessee Titans | -0.314706 | -2.528824 |
| 31 | Washington Commanders | -3.983529 | -2.844706 |
xxxxxxxxxxDefine the variablesDefine the variables
t = df1_sorted['Teams']o = df1_sorted['P/G O Diff']d = df1_sorted['P/G D Diff']xxxxxxxxxx### Barchartplot a Barchart for all the data in the new dataframe so that every team has 2 plots showing their points scored and allowed differentialsBarchart¶
plot a Barchart for all the data in the new dataframe so that every team has 2 plots showing their points scored and allowed differentials
plt.figure(figsize=(20,7))width = 0.3o_bar = np.arange(len(o)) d_bar = [x + width for x in o_bar] plt.bar(o_bar, o, color ='blue', width = width, edgecolor ='black', label ='Offense',align='edge') plt.bar(d_bar, d, color ='red', width = width, edgecolor ='black', label ='Defense',align='edge') plt.axhline(y = 0, color = 'black', linestyle = '-')plt.xlabel('Teams') plt.ylabel('Point Differential') plt.xticks([r + width for r in range(len(o))], t, rotation=(90)) plt.legend()<matplotlib.legend.Legend at 0x1d0eec1a880>
xxxxxxxxxxRandom Forest Regression of the defence of NFL teams through the 2018-21 seasons¶
All modules and libraries imported. csv containing raw data from ProFootballReference.com is also imported and read by the pandas read function. This is defined as df.
xxxxxxxxxximport pandas as pdimport matplotlib.pyplot as pltimport numpy as npfrom sklearn import linear_modelfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import mean_squared_error, r2_scoreimport statsmodels.api as smfrom statsmodels.formula.api import olsfrom statsmodels.stats.anova import anova_lmfrom statsmodels.graphics.factorplots import interaction_plotfrom scipy import statsfrom scipy.stats import rankdataimport seaborn as snsdf = pd.read_csv (r'C:\Users\Rob\Documents\2021stats.csv')print (df)xxxxxxxxxxFinding the mean of all points allowed and added back to the df to use as the baseline later
np.mean(df['PF'])384.1041666666667
xxxxxxxxxxChecking that all the cells in the dataframe is filled
xxxxxxxxxxdf.describe()xxxxxxxxxxOne-hot Encoding to turn the Categorical data into numbers to allow better analysis through the machine learning
x
df = pd.get_dummies(df)df.iloc[:,:].head()xxxxxxxxxxChoose what is to be predicted¶
Define the desired variable to model for (predict) as labels. Then remove that column from the dataframe and return the dataframe as an array.
x
labels = np.array(df['PF'])df= df.drop('PF', axis = 1)df_list = list(df.columns)df = np.array(df)xxxxxxxxxxImport the train_test_split function from sklearn
from sklearn.model_selection import train_test_splitxxxxxxxxxxCreate the training and testing splits. Also add the size of the testing data.
x
train_df, test_df, train_labels, test_labels = train_test_split(df, labels, test_size = 0.25, random_state = 42)xxxxxxxxxxPrint the shapes of the splits created above to make sure there are no errors in them
print('Training df Shape:', train_df.shape)print('Training Labels Shape:', train_labels.shape)print('Testing df Shape:', test_df.shape)print('Testing Labels Shape:', test_labels.shape)Training df Shape: (72, 116) Training Labels Shape: (72,) Testing df Shape: (24, 116) Testing Labels Shape: (24,)
xxxxxxxxxxBaseline¶
Use the PAA (Points Against Average) created earlier as the baseline prediction. Test for errors in the baseline compared to the testing data
x
baseline_preds = test_df[:, df_list.index('PFA')]baseline_errors = abs(baseline_preds - test_labels)print('Average baseline error: ', round(np.mean(baseline_errors), 2))Average baseline error: 53.57
xxxxxxxxxxRandom Forest Regression¶
Import the RandomForestRegression from Sklearn. Then call for the number of decision trees required, in this case 1000. Now train the model to the training set created.
x
from sklearn.ensemble import RandomForestRegressorrf = RandomForestRegressor(n_estimators = 1000, random_state = 42)rf.fit(train_df, train_labels);xxxxxxxxxxNow make the predictions based on the testing data and calculate the mean absolute error
x
predictions = rf.predict(test_df)errors = abs(predictions - test_labels)print('Mean Absolute Error:', round(np.mean(errors), 2), 'degrees.')Mean Absolute Error: 29.02 degrees.
xxxxxxxxxxNow find the accuracy of the predictions by calculating the mean absolute error percentage and subtracting the mean of the mape from 100.
x
mape = 100 * (errors / test_labels)accuracy = 100 - np.mean(mape)print('Accuracy:', round(accuracy, 2), '%.')Accuracy: 91.89 %.
xxxxxxxxxx### Displaying the Random ForestConvert the tree to a png fileDisplaying the Random Forest¶
Convert the tree to a png file
x
from sklearn.tree import export_graphvizimport pydottree = rf.estimators_[5]from sklearn.tree import export_graphvizimport pydottree = rf.estimators_[5]export_graphviz(tree, out_file = 'tree.dot', feature_names = df_list, rounded = True, precision = 1)(graph, ) = pydot.graph_from_dot_file('tree.dot')graph.write_png('tree.png')xxxxxxxxxxRead and display the image of the random forestRead and display the image of the random forest
import matplotlib.image as mpimgplt_1 = plt.figure(figsize=(100, 100))img = mpimg.imread('tree.png')imgplot = plt.imshow(img)plt.show()xxxxxxxxxxList of Importances¶
Make a list of the variables and their importance in the prediction. The list is then sorted from most important to least.
importances = list(rf.feature_importances_)# List of tuples with variable and importancefeature_importances = [(df, round(importance, 2)) for df, importance in zip(df_list, importances)]# Sort the feature importances by most important firstfeature_importances = sorted(feature_importances, key = lambda x: x[1], reverse = True)# Print out the feature and importances [print('Variable: {:20} Importance: {}'.format(*pair)) for pair in feature_importances];Variable: 1D Importance: 0.25 Variable: Ydsp Importance: 0.22 Variable: CAY/PA Importance: 0.08 Variable: Att Importance: 0.06 Variable: Att/Br Importance: 0.06 Variable: Yds Importance: 0.05 Variable: YBC Importance: 0.04 Variable: BrkTkl Importance: 0.03 Variable: Cmp Importance: 0.03 Variable: CAY Importance: 0.03 Variable: YAC Importance: 0.02 Variable: YAC/Att Importance: 0.02 Variable: Attp Importance: 0.02 Variable: IAY Importance: 0.02 Variable: YACp Importance: 0.02 Variable: YBC/Att Importance: 0.01 Variable: IAY/PA Importance: 0.01 Variable: CAY/Cmp Importance: 0.01 Variable: YAC/Cmp Importance: 0.01 Variable: PFA Importance: 0.0 Variable: Tm_Arizona Cardinals19 Importance: 0.0 Variable: Tm_Arizona Cardinals20 Importance: 0.0 Variable: Tm_Arizona Cardinals21 Importance: 0.0 Variable: Tm_Atlanta Falcons19 Importance: 0.0 Variable: Tm_Atlanta Falcons20 Importance: 0.0 Variable: Tm_Atlanta Falcons21 Importance: 0.0 Variable: Tm_Baltimore Ravens19 Importance: 0.0 Variable: Tm_Baltimore Ravens20 Importance: 0.0 Variable: Tm_Baltimore Ravens21 Importance: 0.0 Variable: Tm_Buffalo Bills19 Importance: 0.0 Variable: Tm_Buffalo Bills20 Importance: 0.0 Variable: Tm_Buffalo Bills21 Importance: 0.0 Variable: Tm_Carolina Panthers19 Importance: 0.0 Variable: Tm_Carolina Panthers20 Importance: 0.0 Variable: Tm_Carolina Panthers21 Importance: 0.0 Variable: Tm_Chicago Bears19 Importance: 0.0 Variable: Tm_Chicago Bears20 Importance: 0.0 Variable: Tm_Chicago Bears21 Importance: 0.0 Variable: Tm_Cincinnati Bengals19 Importance: 0.0 Variable: Tm_Cincinnati Bengals20 Importance: 0.0 Variable: Tm_Cincinnati Bengals21 Importance: 0.0 Variable: Tm_Cleveland Browns19 Importance: 0.0 Variable: Tm_Cleveland Browns20 Importance: 0.0 Variable: Tm_Cleveland Browns21 Importance: 0.0 Variable: Tm_Dallas Cowboys19 Importance: 0.0 Variable: Tm_Dallas Cowboys20 Importance: 0.0 Variable: Tm_Dallas Cowboys21 Importance: 0.0 Variable: Tm_Denver Broncos19 Importance: 0.0 Variable: Tm_Denver Broncos20 Importance: 0.0 Variable: Tm_Denver Broncos21 Importance: 0.0 Variable: Tm_Detroit Lions19 Importance: 0.0 Variable: Tm_Detroit Lions20 Importance: 0.0 Variable: Tm_Detroit Lions21 Importance: 0.0 Variable: Tm_Green Bay Packers19 Importance: 0.0 Variable: Tm_Green Bay Packers20 Importance: 0.0 Variable: Tm_Green Bay Packers21 Importance: 0.0 Variable: Tm_Houston Texans19 Importance: 0.0 Variable: Tm_Houston Texans20 Importance: 0.0 Variable: Tm_Houston Texans21 Importance: 0.0 Variable: Tm_Indianapolis Colts19 Importance: 0.0 Variable: Tm_Indianapolis Colts20 Importance: 0.0 Variable: Tm_Indianapolis Colts21 Importance: 0.0 Variable: Tm_Jacksonville Jaguars19 Importance: 0.0 Variable: Tm_Jacksonville Jaguars20 Importance: 0.0 Variable: Tm_Jacksonville Jaguars21 Importance: 0.0 Variable: Tm_Kansas City Chiefs19 Importance: 0.0 Variable: Tm_Kansas City Chiefs20 Importance: 0.0 Variable: Tm_Kansas City Chiefs21 Importance: 0.0 Variable: Tm_Las Vegas Raiders20 Importance: 0.0 Variable: Tm_Las Vegas Raiders21 Importance: 0.0 Variable: Tm_Los Angeles Chargers19 Importance: 0.0 Variable: Tm_Los Angeles Chargers20 Importance: 0.0 Variable: Tm_Los Angeles Chargers21 Importance: 0.0 Variable: Tm_Los Angeles Rams19 Importance: 0.0 Variable: Tm_Los Angeles Rams20 Importance: 0.0 Variable: Tm_Los Angeles Rams21 Importance: 0.0 Variable: Tm_Miami Dolphins19 Importance: 0.0 Variable: Tm_Miami Dolphins20 Importance: 0.0 Variable: Tm_Miami Dolphins21 Importance: 0.0 Variable: Tm_Minnesota Vikings19 Importance: 0.0 Variable: Tm_Minnesota Vikings20 Importance: 0.0 Variable: Tm_Minnesota Vikings21 Importance: 0.0 Variable: Tm_New England Patriots19 Importance: 0.0 Variable: Tm_New England Patriots20 Importance: 0.0 Variable: Tm_New England Patriots21 Importance: 0.0 Variable: Tm_New Orleans Saints19 Importance: 0.0 Variable: Tm_New Orleans Saints20 Importance: 0.0 Variable: Tm_New Orleans Saints21 Importance: 0.0 Variable: Tm_New York Giants19 Importance: 0.0 Variable: Tm_New York Giants20 Importance: 0.0 Variable: Tm_New York Giants21 Importance: 0.0 Variable: Tm_New York Jets19 Importance: 0.0 Variable: Tm_New York Jets20 Importance: 0.0 Variable: Tm_New York Jets21 Importance: 0.0 Variable: Tm_Oakland Raiders19 Importance: 0.0 Variable: Tm_Philadelphia Eagles19 Importance: 0.0 Variable: Tm_Philadelphia Eagles20 Importance: 0.0 Variable: Tm_Philadelphia Eagles21 Importance: 0.0 Variable: Tm_Pittsburgh Steelers19 Importance: 0.0 Variable: Tm_Pittsburgh Steelers20 Importance: 0.0 Variable: Tm_Pittsburgh Steelers21 Importance: 0.0 Variable: Tm_San Francisco 49ers19 Importance: 0.0 Variable: Tm_San Francisco 49ers20 Importance: 0.0 Variable: Tm_San Francisco 49ers21 Importance: 0.0 Variable: Tm_Seattle Seahawks19 Importance: 0.0 Variable: Tm_Seattle Seahawks20 Importance: 0.0 Variable: Tm_Seattle Seahawks21 Importance: 0.0 Variable: Tm_Tampa Bay Buccaneers19 Importance: 0.0 Variable: Tm_Tampa Bay Buccaneers20 Importance: 0.0 Variable: Tm_Tampa Bay Buccaneers21 Importance: 0.0 Variable: Tm_Tennessee Titans19 Importance: 0.0 Variable: Tm_Tennessee Titans20 Importance: 0.0 Variable: Tm_Tennessee Titans21 Importance: 0.0 Variable: Tm_Washington Football Team20 Importance: 0.0 Variable: Tm_Washington Football Team21 Importance: 0.0 Variable: Tm_Washington Redskins19 Importance: 0.0
xxxxxxxxxxRandom Forest Regression of the defence of NFL teams through the 2018-21 seasons¶
All modules and libraries imported. csv containing raw data from ProFootballReference.com is also imported and read by the pandas read function. This is defined as df.
xxxxxxxxxximport pandas as pdimport matplotlib.pyplot as pltimport numpy as npfrom sklearn import linear_modelfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import mean_squared_error, r2_scoreimport statsmodels.api as smfrom statsmodels.formula.api import olsfrom statsmodels.stats.anova import anova_lmfrom statsmodels.graphics.factorplots import interaction_plotfrom scipy import statsfrom scipy.stats import rankdataimport seaborn as snsdf = pd.read_csv (r'C:\Users\Rob\Documents\2021stats.csv')print (df) Tm Att Yds 1D YBC YBC/Att YAC YAC/Att \
0 Arizona Cardinals21 496 2076 127 1286 2.6 790 1.6
1 Atlanta Falcons21 393 1451 75 811 2.1 640 1.6
2 Baltimore Ravens21 517 2479 159 1579 3.1 900 1.7
3 Buffalo Bills21 461 2209 134 1208 2.6 1001 2.2
4 Carolina Panthers21 455 1842 117 1036 2.3 806 1.8
.. ... ... ... ... ... ... ... ...
91 San Francisco 49ers19 498 2305 110 1472 3.0 833 1.7
92 Seattle Seahawks19 481 2200 121 1125 2.3 1075 2.2
93 Tampa Bay Buccaneers19 409 1521 81 746 1.8 775 1.9
94 Tennessee Titans19 445 2223 104 940 2.1 1283 2.9
95 Washington Redskins19 356 1583 74 680 1.9 903 2.5
BrkTkl Att/Br ... Ydsp IAY IAY/PA CAY CAY/Cmp CAY/PA YACp \
0 28 17.7 ... 4276 4459 7.5 2340 5.6 4.0 2279
1 19 20.7 ... 3713 4127 7.2 2252 6.0 3.9 1735
2 31 16.7 ... 3961 5239 8.6 2552 6.4 4.2 1715
3 40 11.5 ... 4284 5364 8.2 2690 6.5 4.1 1760
4 31 14.7 ... 3239 4434 7.4 1751 5.0 2.9 1822
.. ... ... ... ... ... ... ... ... ... ...
91 29 17.2 ... 3792 3124 6.5 1837 5.5 3.8 2192
92 38 12.7 ... 3791 4869 9.4 2402 7.0 4.6 1708
93 35 11.7 ... 4845 6498 10.3 3254 8.5 5.2 1873
94 40 11.1 ... 3582 3869 8.6 2103 7.1 4.7 1853
95 15 23.7 ... 2812 3639 7.6 1751 5.9 3.7 1454
YAC/Cmp PF PFA
0 5.5 449 384.1
1 4.6 313 384.1
2 4.3 387 384.1
3 4.2 483 384.1
4 5.2 304 384.1
.. ... ... ...
91 6.6 479 384.1
92 5.0 405 384.1
93 4.9 458 384.1
94 6.2 402 384.1
95 4.9 266 384.1
[96 rows x 22 columns]
xxxxxxxxxx### Bar ChartDisplay the teams and a variable Bar Chart¶
Display the teams and a variable
xxxxxxxxxxplt.figure(figsize=(17,5))plt.bar(df['Tm'], df['Ydsp'])plt.xticks(rotation = 90)plt.ylim(2500, 5250)plt.show()plt.scatter(df['PF'], df['1D'])<matplotlib.collections.PathCollection at 0x175720b5f40>
xxxxxxxxxxDefine the columns that will be used in the models or testsDefine the columns that will be used in the models or tests
xxxxxxxxxxpf = df['PF']pf = np.array(pf).reshape((-1,1))d1 = df['1D']xxxxxxxxxx### Linear RegressionPerform a Linear Regression Linear Regression¶
Perform a Linear Regression
model = LinearRegression()model = LinearRegression().fit(pf, d1)r_sq = model.score(pf, d1)print(f"coefficient of determination: {r_sq}")print(f"intercept: {model.intercept_}")print(f"slope: {model.coef_}")coefficient of determination: 0.26945079972915786 intercept: 41.16998558184153 slope: [0.17247061]
xxxxxxxxxxd1_pred = model.intercept_ + model.coef_ * pfprint(f"predicted response:\n{d1_pred}")predicted response: [[118.60929082] [ 95.15328745] [107.91611282] [124.47329167] [ 93.60105194] [ 94.80834623] [120.50646757] [101.36222952] [132.57941048] [ 98.94764094] [ 97.22293481] [118.78176144] [ 89.46175722] [118.95423205] [ 84.80505067] [123.95587983] [105.67399485] [122.92105615] [120.50646757] [ 99.98246462] [114.46999611] [120.85140879] [103.94928872] [ 85.66740374] [ 94.63587562] [117.74693776] [100.32740584] [114.81493734] [109.29587772] [129.30246883] [113.43517243] [ 98.94764094] [111.88293692] [109.46834833] [121.88623247] [127.5777627 ] [101.53470014] [105.32905362] [ 94.80834623] [111.53799569] [109.29587772] [ 96.87799358] [106.19140669] [128.9575276 ] [107.39870098] [118.95423205] [ 93.94599316] [122.74858554] [116.02223163] [107.39870098] [105.32905362] [110.84811324] [115.33234918] [ 97.39540542] [124.30082105] [ 89.46175722] [ 83.08034454] [ 98.77517033] [112.91776059] [106.01893607] [120.33399695] [126.02552718] [125.85305657] [ 98.94764094] [103.43187688] [106.88128914] [132.75188109] [ 95.32575807] [ 99.80999401] [ 89.46175722] [ 89.28928661] [ 98.94764094] [116.02223163] [ 89.80669845] [ 99.98246462] [106.01893607] [106.3638773 ] [103.43187688] [ 92.91116948] [118.95423205] [ 99.29258217] [109.12340711] [ 93.94599316] [111.36552508] [113.60764305] [120.16152634] [ 99.98246462] [ 88.77187477] [ 95.15328745] [107.57117159] [ 91.01399274] [123.78340921] [111.02058385] [120.16152634] [110.50317201] [ 87.04716864]]
xxxxxxxxxx### RedefiningRedefine the variables so that x has two lists that were the most important from the Random Forest RegressionRedefining¶
Redefine the variables so that x has two lists that were the most important from the Random Forest Regression
xxxxxxxxxxx = df['1D'], df['Ydsp']y = df['PF']x, y = np.array(x), np.array(y)xxxxxxxxxxx = np.vstack((df['1D'], df['Ydsp'])).Txarray([[ 127, 4276],
[ 75, 3713],
[ 159, 3961],
[ 134, 4284],
[ 117, 3239],
[ 119, 3207],
[ 101, 4403],
[ 138, 3320],
[ 111, 4800],
[ 123, 3593],
[ 104, 3598],
[ 109, 4315],
[ 77, 3305],
[ 154, 3361],
[ 92, 3436],
[ 119, 4791],
[ 95, 4567],
[ 112, 4800],
[ 101, 4642],
[ 87, 3651],
[ 103, 4238],
[ 139, 3857],
[ 113, 3186],
[ 90, 3196],
[ 87, 3541],
[ 163, 3404],
[ 85, 3778],
[ 130, 4221],
[ 106, 3432],
[ 106, 5229],
[ 134, 3418],
[ 129, 3441],
[ 136, 3916],
[ 86, 4363],
[ 165, 2739],
[ 119, 4620],
[ 110, 3888],
[ 93, 3655],
[ 92, 3448],
[ 133, 3539],
[ 115, 4161],
[ 94, 3451],
[ 93, 4104],
[ 114, 4106],
[ 83, 4538],
[ 129, 4053],
[ 80, 3699],
[ 110, 4854],
[ 121, 4217],
[ 111, 4329],
[ 123, 4014],
[ 100, 3736],
[ 139, 4009],
[ 143, 2890],
[ 147, 3758],
[ 91, 3026],
[ 94, 2796],
[ 114, 3327],
[ 81, 4003],
[ 101, 4033],
[ 111, 3941],
[ 82, 4626],
[ 142, 3653],
[ 108, 3465],
[ 109, 3477],
[ 84, 4714],
[ 188, 3225],
[ 120, 3229],
[ 82, 3650],
[ 85, 3291],
[ 85, 3652],
[ 90, 3554],
[ 120, 4751],
[ 76, 3115],
[ 82, 3900],
[ 90, 3733],
[ 112, 3783],
[ 131, 3108],
[ 84, 3760],
[ 93, 4498],
[ 90, 4426],
[ 92, 4499],
[ 64, 3804],
[ 106, 3523],
[ 110, 3961],
[ 97, 4244],
[ 89, 3731],
[ 61, 3111],
[ 104, 3926],
[ 104, 3833],
[ 75, 2981],
[ 110, 3792],
[ 121, 3791],
[ 81, 4845],
[ 104, 3582],
[ 74, 2812]], dtype=int64)xxxxxxxxxx### Linear RegressionUse a linear regression for the x and y variables so we can now see the correlation coefficients of the combined Linear Regression¶
Use a linear regression for the x and y variables so we can now see the correlation coefficients of the combined
xxxxxxxxxxmodel = LinearRegression().fit(x, y)r_sq = model.score(x, y)print(f"coefficient of determination: {r_sq}")print(f"intercept: {model.intercept_}")print(f"coefficients: {model.coef_}")coefficient of determination: 0.7068457568512116 intercept: -135.88423483853654 coefficients: [1.73092809 0.08727336]
xxxxxxxxxxUse the model to predict y based on the x variables already usedUse the model to predict y based on the x variables already used
xxxxxxxxxxy_pred = model.predict(x)print(f"predicted response:\n{y_pred}")predicted response: [457.12451928 317.98135694 485.0231099 469.93920281 349.31276427 349.98187294 423.20410558 392.73139635 475.16091034 390.59310219 358.14183523 429.37147466 285.83568233 424.00445357 323.23241384 488.22287484 427.13136803 476.89183843 444.06243857 333.34154574 412.2658574 441.32811864 337.76356383 298.82495131 323.74147616 443.33556087 340.96340625 457.51726877 347.11631369 503.94654122 394.36047324 387.71312005 441.28446259 393.74924981 388.7606328 473.29913031 393.83667812 344.07620773 324.27969416 403.18962168 426.3169458 328.00337043 383.26194627 419.78598292 403.8293035 441.12441624 325.41417036 478.14274367 441.58982251 434.05515788 427.33518665 363.26184652 454.59366933 363.8584921 446.53548076 285.71940824 270.83931977 351.80003565 353.67619983 390.91296246 400.19309428 409.77843106 428.71713752 353.45819075 356.23639916 420.92034291 470.98683177 353.63281495 324.59963192 298.46128003 329.96696291 330.06881411 486.46286854 267.52281588 346.41797186 345.69074552 388.13483153 362.11294743 337.66155768 417.64765003 406.17118384 416.00399529 306.88302367 355.05818944 400.20763339 402.40392901 343.7852707 241.20980106 386.76749724 378.65107478 254.09725758 385.45843558 404.41137124 427.16036876 356.74546148 237.61713168]
xxxxxxxxxxy_pred = model.intercept_ + np.sum(model.coef_ * x, axis=1)print(f"predicted response:\n{y_pred}")predicted response: [457.12451928 317.98135694 485.0231099 469.93920281 349.31276427 349.98187294 423.20410558 392.73139635 475.16091034 390.59310219 358.14183523 429.37147466 285.83568233 424.00445357 323.23241384 488.22287484 427.13136803 476.89183843 444.06243857 333.34154574 412.2658574 441.32811864 337.76356383 298.82495131 323.74147616 443.33556087 340.96340625 457.51726877 347.11631369 503.94654122 394.36047324 387.71312005 441.28446259 393.74924981 388.7606328 473.29913031 393.83667812 344.07620773 324.27969416 403.18962168 426.3169458 328.00337043 383.26194627 419.78598292 403.8293035 441.12441624 325.41417036 478.14274367 441.58982251 434.05515788 427.33518665 363.26184652 454.59366933 363.8584921 446.53548076 285.71940824 270.83931977 351.80003565 353.67619983 390.91296246 400.19309428 409.77843106 428.71713752 353.45819075 356.23639916 420.92034291 470.98683177 353.63281495 324.59963192 298.46128003 329.96696291 330.06881411 486.46286854 267.52281588 346.41797186 345.69074552 388.13483153 362.11294743 337.66155768 417.64765003 406.17118384 416.00399529 306.88302367 355.05818944 400.20763339 402.40392901 343.7852707 241.20980106 386.76749724 378.65107478 254.09725758 385.45843558 404.41137124 427.16036876 356.74546148 237.61713168]
x
Input all the data from the current season for the variables used to make the model above. None f these variables were averages like yards/play etc therefore, the variables had to be converted to a full season projection. this was done by dividing the data by games played by that team and multiplied by 17(the number of games in a season)Input all the data from the current season for the variables used to make the model above. None f these variables were averages like yards/play etc therefore, the variables had to be converted to a full season projection. this was done by dividing the data by games played by that team and multiplied by 17(the number of games in a season)
xxxxxxxxxxx_new = [153,2579],[97,5451],[82,2885],[133,2088],[90,4738],[163,3645],[85,4429],[131,3761],[106,3115],[87,3725],[96,3635],[106,3761],[99,3558],[119,3975],[92,5032],[80,4786],[96,4114],[128,4356],[121,3796],[167,2734],[102,4369],[96,2813],[153,3475],[82,3582],[130,4057],[68,4068],[157,3225],[87,4750],[114,3837],[116,3982],[72,4524],[104,3757]x_new([153, 2579], [97, 5451], [82, 2885], [133, 2088], [90, 4738], [163, 3645], [85, 4429], [131, 3761], [106, 3115], [87, 3725], [96, 3635], [106, 3761], [99, 3558], [119, 3975], [92, 5032], [80, 4786], [96, 4114], [128, 4356], [121, 3796], [167, 2734], [102, 4369], [96, 2813], [153, 3475], [82, 3582], [130, 4057], [68, 4068], [157, 3225], [87, 4750], [114, 3837], [116, 3982], [72, 4524], [104, 3757])
x
### Predicting current seasonUse the data from the current season to predict every teams points scored over the full season Predicting current season¶
Use the data from the current season to predict every teams points scored over the full season
xxxxxxxxxxy_new = model.predict(x_new)y_newarray([354.02575813, 507.74287427, 257.83551168, 276.55597663,
433.4004721 , 464.36844058, 397.77836346, 419.10245137,
319.45065864, 339.79977436, 347.52352481, 375.82924906,
345.99626038, 417.00781325, 462.52069606, 420.28031245,
389.32746415, 465.83731616, 404.84773804, 391.78612218,
421.96773944, 275.78482307, 432.22268849, 318.66504345,
443.20443777, 336.84690302, 417.32806092, 429.25496814,
396.30944914, 412.4259425 , 383.56726745, 372.01829944])xxxxxxxxxxPrint the predictions for each team which is in alphabetical orderPrint the predictions for each team which is in alphabetical order
xxxxxxxxxxylist = list(y_new)print(type(ylist))print(ylist)print(len(ylist))<class 'list'> [354.025758128841, 507.7428742662682, 257.8355116840193, 276.55597663408105, 433.4004720970019, 464.3684405774597, 397.7783634638899, 419.1024513654133, 319.4506586439897, 339.79977436144287, 347.5235248098312, 375.82924906324644, 345.9962603828671, 417.00781325374936, 462.5206960571218, 420.28031244567745, 389.327464145472, 465.8373161595213, 404.84773803692116, 391.7861221842847, 421.9677394424355, 275.7848230689193, 432.2226884936305, 318.66504345216475, 443.2044377688374, 336.8469030170672, 417.3280609164444, 429.2549681381275, 396.30944914338187, 412.4259424959642, 383.5672674460657, 372.0182994399446] 32
xxxxxxxxxxCreate a list for all the teams in the NFL in alphabetical orderCreate a list for all the teams in the NFL in alphabetical order
Teams = ('Atlanta Falcons', 'Buffalo Bills', 'Carolina Panthers', 'Chicago Bears', 'Cincinnati Bengals', 'Cleveland Browns', 'Indianapolis Colts', 'Arizona Cardinals', 'Dallas Cowboys', 'Denver Broncos', 'Detroit Lions', 'Green Bay Packers', 'Houston Texans', 'Jacksonville Jaguars', 'Kansas City Chiefs', 'Miami Dolphins', 'Minnesota Vikings', 'New Orleans Saints', 'New England Patriots', 'New York Giants', 'New York Jets', 'Tennessee Titans', 'Philadelphia Eagles', 'Pittsburgh Steelers', 'Las Vegas Raiders', 'Los Angeles Rams', 'Baltimore Ravens', 'Los Angeles Chargers', 'Seattle Seahawks', 'San Francisco 49ers', 'Tampa Bay Buccaneers', 'Washington Commanders')print(len(Teams))32
xxxxxxxxxxCreate a data frame for the predictions createdCreate a data frame for the predictions created
x
data = ylistdf1 = pd.DataFrame(data, columns=['Pred']) df1xxxxxxxxxxdf1['Teams'] = Teamsprint(df1)xxxxxxxxxxSave the data frame Save the data frame
xxxxxxxxxxdf1.to_csv(r'C:\Users\Rob\Documents\Prediction.csv', index = False)--------------------------------------------------------------------------- PermissionError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_6928/2031236065.py in <module> ----> 1 df1.to_csv(r'C:\Users\Rob\Documents\Prediction.csv', index = False) ~\mambaforge\lib\site-packages\pandas\core\generic.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, decimal, errors, storage_options) 3549 ) 3550 -> 3551 return DataFrameRenderer(formatter).to_csv( 3552 path_or_buf, 3553 line_terminator=line_terminator, ~\mambaforge\lib\site-packages\pandas\io\formats\format.py in to_csv(self, path_or_buf, encoding, sep, columns, index_label, mode, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, errors, storage_options) 1178 formatter=self.fmt, 1179 ) -> 1180 csv_formatter.save() 1181 1182 if created_buffer: ~\mambaforge\lib\site-packages\pandas\io\formats\csvs.py in save(self) 239 """ 240 # apply compression and byte/text conversion --> 241 with get_handle( 242 self.filepath_or_buffer, 243 self.mode, ~\mambaforge\lib\site-packages\pandas\io\common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options) 787 if ioargs.encoding and "b" not in ioargs.mode: 788 # Encoding --> 789 handle = open( 790 handle, 791 ioargs.mode, PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Rob\\Documents\\Prediction.csv'
x
# Random Forest Regression of the defence of NFL teams through the 2018-21 seasons Import all necessary libraries, modules and the dataframe chosenRandom Forest Regression of the defence of NFL teams through the 2018-21 seasons¶
Import all necessary libraries, modules and the dataframe chosen
xxxxxxxxxximport pandas as pdimport matplotlib.pyplot as pltimport numpy as npfrom sklearn import linear_modelfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import mean_squared_error, r2_scoreimport statsmodels.api as smfrom statsmodels.formula.api import olsfrom statsmodels.stats.anova import anova_lmfrom statsmodels.graphics.factorplots import interaction_plotfrom scipy import statsimport matplotlib.colors as mcolorsfrom scipy.stats import rankdataimport seaborn as snsdf = pd.read_csv (r'C:\Users\Rob\Documents\dstats.csv')print (df)x
Define the 3 most important variables from the Random Forest Regression that are not dicectly related and the y variable which is what was trained to predict. Make the 3 variables into an array.Define the 3 most important variables from the Random Forest Regression that are not dicectly related and the y variable which is what was trained to predict. Make the 3 variables into an array.
xxxxxxxxxxx = df['ANY/A'], df['Rate'], df['Ydsr']y = df['PA']x, y = np.array(x), np.array(y)x = np.vstack((df['ANY/A'], df['Rate'],df['Ydsr'])).Tprint(x)xxxxxxxxxx### Linear RegressionConduct a Linear Regression on the variables deifned aboveLinear Regression¶
Conduct a Linear Regression on the variables deifned above
xxxxxxxxxxmodel = LinearRegression().fit(x, y)r_sq = model.score(x, y)print(f"coefficient of determination: {r_sq}")print(f"intercept: {model.intercept_}")print(f"coefficients: {model.coef_}")coefficient of determination: 0.7507669629553981 intercept: -72.00298605592161 coefficients: [29.23877774 1.76379634 0.05969919]
xxxxxxxxxxPrepare the model to predict the points allowed for the current season. Prepare the model to predict the points allowed for the current season.
xxxxxxxxxxy_pred = model.predict(x)print(f"predicted response:\n{y_pred}")y_pred = model.intercept_ + np.sum(model.coef_ * x, axis=1)print(f"predicted response:\n{y_pred}")xxxxxxxxxx### Current season dataInput all the data for the variables chosen for the current season. This data was taken from week 7 which is problematic as some teams had their BYE's. therefore the data had to be manually taken from ProFootballReference. Data that is already averaged or not cumulative such as Qb Rate and ANY/A did not suffer this issue, but Ydsr did. as a result the projected number was used by dividing the current number by games played and then multiplied by 17 (the number of games in the regular season).Current season data¶
Input all the data for the variables chosen for the current season. This data was taken from week 7 which is problematic as some teams had their BYE's. therefore the data had to be manually taken from ProFootballReference. Data that is already averaged or not cumulative such as Qb Rate and ANY/A did not suffer this issue, but Ydsr did. as a result the projected number was used by dividing the current number by games played and then multiplied by 17 (the number of games in the regular season).
xxxxxxxxxxx_new = [6.9, 97.2, 1719],[7.2, 100.2, 1705],[5.9, 90.1, 1787],[3.9, 71.2, 1294],[6.1, 87.7, 2054],[4.9, 75.5, 2545],[5.2, 75.2, 2023],[7, 92.7, 2304],[4.1, 77.7, 2042],[4, 74.9, 2238],[7.8, 102.4, 2372],[6.1, 95, 2372],[5.9, 78.8, 2799],[6.1, 94.9, 2098],[5.8, 84.1, 1874],[6.9, 103.8, 1564],[7.5, 104.7, 1768],[5.9, 93, 2338],[5.6, 86.5, 1517],[6.9, 98.9, 1755],[6.8, 93.7, 1912],[5.3, 81.5, 2324],[7.1, 98.4, 2093],[6.5, 85.5, 2455],[4.9, 77.3, 1789],[3.5, 66, 1867],[6.6, 87.6, 2020],[5.3, 86.6, 1547],[6.8, 95.9, 2545],[5.1, 86.4, 2010],[6.9, 100.4, 1646],[7, 99.1, 2010]print(x_new)([6.9, 97.2, 1719], [7.2, 100.2, 1705], [5.9, 90.1, 1787], [3.9, 71.2, 1294], [6.1, 87.7, 2054], [4.9, 75.5, 2545], [5.2, 75.2, 2023], [7, 92.7, 2304], [4.1, 77.7, 2042], [4, 74.9, 2238], [7.8, 102.4, 2372], [6.1, 95, 2372], [5.9, 78.8, 2799], [6.1, 94.9, 2098], [5.8, 84.1, 1874], [6.9, 103.8, 1564], [7.5, 104.7, 1768], [5.9, 93, 2338], [5.6, 86.5, 1517], [6.9, 98.9, 1755], [6.8, 93.7, 1912], [5.3, 81.5, 2324], [7.1, 98.4, 2093], [6.5, 85.5, 2455], [4.9, 77.3, 1789], [3.5, 66, 1867], [6.6, 87.6, 2020], [5.3, 86.6, 1547], [6.8, 95.9, 2545], [5.1, 86.4, 2010], [6.9, 100.4, 1646], [7, 99.1, 2010])
xxxxxxxxxx### Model the current season dataNow use the data for the current season for the model to predict the amount of points a team will allow over the season.Model the current season data¶
Now use the data for the current season for the model to predict the amount of points a team will allow over the season.
xxxxxxxxxxy_new = model.predict(x_new)y_newarray([403.80849401, 417.03572765, 366.10630729, 244.86129976,
383.66063572, 356.36809008, 333.44760663, 433.71931519,
306.82872659, 310.66726058, 478.27870685, 415.52069181,
406.59099032, 398.98673375, 357.79348115, 406.19617516,
437.50549354, 404.11557108, 334.86622551, 408.95611866,
406.23327298, 365.45285791, 434.10030271, 415.4151706 ,
314.41033485, 258.20168434, 396.07387245, 328.06194756,
447.90321302, 349.50215832, 405.09460131, 427.45604949])xxxxxxxxxxNow define the predictions and display the dtype and length of the list which must be 32 (the number of teams in the NFL)Now define the predictions and display the dtype and length of the list which must be 32 (the number of teams in the NFL)
xxxxxxxxxxylist = list(y_new)print(type(ylist))print(ylist)print(len(ylist))<class 'list'> [403.80849400500233, 417.03572765423337, 366.1063072939253, 244.86129975785502, 383.66063572031277, 356.3680900799129, 333.447606633793, 433.71931519290433, 306.82872659012247, 310.6672605762693, 478.2787068459364, 415.5206918052479, 406.5909903224467, 398.98673375048844, 357.7934811542876, 406.1961751617915, 437.50549353751956, 404.11557108196075, 334.86622550923886, 408.9561186620054, 406.2332729826779, 365.4528579054986, 434.10030270854656, 415.4151705987924, 314.4103348462005, 258.20168434438426, 396.07387245301845, 328.0619475596736, 447.90321302431096, 349.5021583247003, 405.09460131082767, 427.45604948942344] 32
xxxxxxxxxxNow create a new list of all the teams in alphabetical order Now create a new list of all the teams in alphabetical order
xxxxxxxxxxTeams = ('Arizona Cardinals','Atlanta Falcons','Baltimore Ravens','Buffalo Bills','Carolina Panthers','Chicago Bears','Cincinnati Bengals','Cleveland Browns','Dallas Cowboys','Denver Broncos','Detroit Lions','Green Bay Packers','Houston Texans','Indianapolis Colts','Jacksonville Jaguars','Kansas City Chiefs','Las Vegas Raiders','Los Angeles Chargers','Los Angeles Rams','Miami Dolphins','Minnesota Vikings','New England Patriots','New Orleans Saints','New York Giants','New York Jets','Philadelphia Eagles','Pittsburgh Steelers','San Francisco 49ers','Seattle Seahawks','Tampa Bay Buccaneers','Tennessee Titans','Washington Commanders')print(len(Teams))32
xxxxxxxxxx### New dataframeCreate a new dataframe consisting of the teams and their corresponding points predicted by the model for the current season.New dataframe¶
Create a new dataframe consisting of the teams and their corresponding points predicted by the model for the current season.
xxxxxxxxxxdata = ylist df1 = pd.DataFrame(data, columns=['Pred'])df1['Teams'] = Teamsprint(df1)Pred Teams 0 403.808494 Arizona Cardinals 1 417.035728 Atlanta Falcons 2 366.106307 Baltimore Ravens 3 244.861300 Buffalo Bills 4 383.660636 Carolina Panthers 5 356.368090 Chicago Bears 6 333.447607 Cincinnati Bengals 7 433.719315 Cleveland Browns 8 306.828727 Dallas Cowboys 9 310.667261 Denver Broncos 10 478.278707 Detroit Lions 11 415.520692 Green Bay Packers 12 406.590990 Houston Texans 13 398.986734 Indianapolis Colts 14 357.793481 Jacksonville Jaguars 15 406.196175 Kansas City Chiefs 16 437.505494 Las Vegas Raiders 17 404.115571 Los Angeles Chargers 18 334.866226 Los Angeles Rams 19 408.956119 Miami Dolphins 20 406.233273 Minnesota Vikings 21 365.452858 New England Patriots 22 434.100303 New Orleans Saints 23 415.415171 New York Giants 24 314.410335 New York Jets 25 258.201684 Philadelphia Eagles 26 396.073872 Pittsburgh Steelers 27 328.061948 San Francisco 49ers 28 447.903213 Seattle Seahawks 29 349.502158 Tampa Bay Buccaneers 30 405.094601 Tennessee Titans 31 427.456049 Washington Commanders
xxxxxxxxxx### SaveSave the data frameSave¶
Save the data frame
xxxxxxxxxxdf1.to_csv(r'C:\Users\Rob\Documents\DPrediction.csv', index = False)x
# Random Forest Regression of the defence of NFL teams through the 2018-21 seasons Import all necessary libraries, modules. The dataframe chosen to be used includes the predictions made, the points allowed, points allowed per game(from ProFootballReference) and the point projection which is the points allowed divided by games played and multiplied by 17.Random Forest Regression of the defence of NFL teams through the 2018-21 seasons¶
Import all necessary libraries, modules. The dataframe chosen to be used includes the predictions made, the points allowed, points allowed per game(from ProFootballReference) and the point projection which is the points allowed divided by games played and multiplied by 17.
import pandas as pdimport matplotlib.pyplot as pltimport numpy as npfrom sklearn import linear_modelfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import mean_squared_error, r2_scoreimport statsmodels.api as smfrom statsmodels.formula.api import olsfrom statsmodels.stats.anova import anova_lmfrom statsmodels.graphics.factorplots import interaction_plotfrom scipy import statsimport matplotlib.colors as mcolorsfrom scipy.stats import rankdataimport seaborn as snsdf = pd.read_csv (r'C:\Users\Rob\Documents\DPrediction.csv')print (df)xxxxxxxxxxMake a new attribute by dividing the predictions by 17 to get the value of the points prediction allowed per game.Make a new attribute by dividing the predictions by 17 to get the value of the points prediction allowed per game.
preg = df['Pred']/17df['Pred/G'] = pregdf| Teams | G | Pred | PA | Pts/G | Point Projection | Pred/G | |
|---|---|---|---|---|---|---|---|
| 0 | Arizona Cardinals | 7 | 403.81 | 176 | 25.1 | 427.428571 | 23.753529 |
| 1 | Atlanta Falcons | 7 | 417.04 | 171 | 24.4 | 415.285714 | 24.531765 |
| 2 | Baltimore Ravens | 7 | 366.11 | 161 | 23.0 | 391.000000 | 21.535882 |
| 3 | Buffalo Bills | 6 | 244.86 | 81 | 13.5 | 229.500000 | 14.403529 |
| 4 | Carolina Panthers | 7 | 383.66 | 149 | 21.3 | 361.857143 | 22.568235 |
| 5 | Chicago Bears | 7 | 356.37 | 132 | 18.9 | 320.571429 | 20.962941 |
| 6 | Cincinnati Bengals | 7 | 333.45 | 132 | 18.9 | 320.571429 | 19.614706 |
| 7 | Cleveland Browns | 7 | 433.72 | 186 | 26.6 | 451.714286 | 25.512941 |
| 8 | Dallas Cowboys | 7 | 306.83 | 104 | 14.9 | 252.571429 | 18.048824 |
| 9 | Denver Broncos | 7 | 310.67 | 115 | 16.4 | 279.285714 | 18.274706 |
| 10 | Detroit Lions | 6 | 478.28 | 194 | 32.3 | 549.666667 | 28.134118 |
| 11 | Green Bay Packers | 7 | 415.52 | 146 | 20.9 | 354.571429 | 24.442353 |
| 12 | Houston Texans | 6 | 406.59 | 137 | 22.8 | 388.166667 | 23.917059 |
| 13 | Indianapolis Colts | 7 | 398.99 | 140 | 20.0 | 340.000000 | 23.470000 |
| 14 | Jacksonville Jaguars | 7 | 357.79 | 137 | 19.6 | 332.714286 | 21.046471 |
| 15 | Kansas City Chiefs | 7 | 406.20 | 172 | 24.6 | 417.714286 | 23.894118 |
| 16 | Las Vegas Raiders | 6 | 437.51 | 150 | 25.0 | 425.000000 | 25.735882 |
| 17 | Los Angeles Chargers | 7 | 404.12 | 189 | 27.0 | 459.000000 | 23.771765 |
| 18 | Los Angeles Rams | 6 | 334.87 | 126 | 21.0 | 357.000000 | 19.698235 |
| 19 | Miami Dolphins | 7 | 408.96 | 165 | 23.6 | 400.714286 | 24.056471 |
| 20 | Minnesota Vikings | 6 | 406.23 | 118 | 19.7 | 334.333333 | 23.895882 |
| 21 | New England Patriots | 7 | 365.45 | 146 | 20.9 | 354.571429 | 21.497059 |
| 22 | New Orleans Saints | 7 | 434.10 | 200 | 28.6 | 485.714286 | 25.535294 |
| 23 | New York Giants | 7 | 415.42 | 130 | 18.6 | 315.714286 | 24.436471 |
| 24 | New York Jets | 7 | 314.41 | 137 | 19.6 | 332.714286 | 18.494706 |
| 25 | Philadelphia Eagles | 6 | 258.20 | 105 | 17.5 | 297.500000 | 15.188235 |
| 26 | Pittsburgh Steelers | 7 | 396.07 | 162 | 23.1 | 393.428571 | 23.298235 |
| 27 | San Francisco 49ers | 7 | 328.06 | 133 | 19.0 | 323.000000 | 19.297647 |
| 28 | Seattle Seahawks | 7 | 447.90 | 186 | 26.6 | 451.714286 | 26.347059 |
| 29 | Tampa Bay Buccaneers | 7 | 349.50 | 124 | 17.7 | 301.142857 | 20.558824 |
| 30 | Tennessee Titans | 6 | 405.09 | 128 | 21.3 | 362.666667 | 23.828824 |
| 31 | Washington Commanders | 7 | 427.46 | 156 | 22.3 | 378.857143 | 25.144706 |
xxxxxxxxxx### Scatter GraphScatter the projected points allowed and the prediction for points allowed.Scatter Graph¶
Scatter the projected points allowed and the prediction for points allowed.
plt.scatter(df['Point Projection'], df['Pred'], c = df['Pred'], cmap = 'Reds')plt.xlabel('Point Projection')plt.ylabel('My Prediction')plt.title('Point Prediction vs My Prediction')plt.xlim(220,560)plt.ylim(220,560)plt.show()xxxxxxxxxx### Defining VariablesDefine the variables and reshape them to allow a linear regression to be conducted easier.Defining Variables¶
Define the variables and reshape them to allow a linear regression to be conducted easier.
pp = df['Point Projection']pp = np.array(pp).reshape((-1,1))pre = df['Pred']preg = df['Pred/G']preg = np.array(preg).reshape(-1,1)pg = df['Pts/G']xxxxxxxxxx### Linear RegressionConduct the linear regression for the predicted and projected points allowed per gameLinear Regression¶
Conduct the linear regression for the predicted and projected points allowed per game
model = LinearRegression()model = LinearRegression().fit(preg, pg)r_sq = model.score(preg, pg)print(f"coefficient of determination: {r_sq}")print(f"intercept: {model.intercept_}")print(f"slope: {model.coef_}")coefficient of determination: 0.6817815162467973 intercept: -1.6450878163398777 slope: [1.04538607]
yr_proj = df['Point Projection']preg = df['Pred']/17x
### Projection and Prediction ComparisonFirst display the difference between the prediction and the projected points allowed. In this model it is appropriate to do this for both full year and per game data to test that they share the same correlation coefficient. The per game data is more valuable as it makes it easier to identify how diferent the predictions and projections are as points in the NFL are often 3 or 7. the numbers 2 ,6 and 8 can also occur but are less common and situational which cant be controlled for by this model.The sum of all and squared differences should be provided also. print("difference:", pre - yr_proj)print("Sum of All Differences:", np.sum(np.abs(pre - yr_proj)))print("Sum of Squared Differences:", np.sum(np.square(pre - yr_proj)))print("correlation:", np.corrcoef(np.array((pre, yr_proj)))[0, 1])difference: 0 -23.618571 1 1.754286 2 -24.890000 3 15.360000 4 21.802857 5 35.798571 6 12.878571 7 -17.994286 8 54.258571 9 31.384286 10 -71.386667 11 60.948571 12 18.423333 13 58.990000 14 25.075714 15 -11.514286 16 12.510000 17 -54.880000 18 -22.130000 19 8.245714 20 71.896667 21 10.878571 22 -51.614286 23 99.705714 24 -18.304286 25 -39.300000 26 2.641429 27 5.060000 28 -3.814286 29 48.357143 30 42.423333 31 48.602857 dtype: float64 Sum of All Differences: 1026.4428569 Sum of Squared Differences: 51029.10812112315 correlation: 0.8250897929685962
print("difference:", preg - pg)print("Sum of All Differences:", np.sum(np.abs(preg - pg)))print("Sum of Squared Differences:", np.sum(np.square(preg - pg)))print("correlation:", np.corrcoef(np.array((preg, pg)))[0, 1])difference: 0 -1.346471 1 0.131765 2 -1.464118 3 0.903529 4 1.268235 5 2.062941 6 0.714706 7 -1.087059 8 3.148824 9 1.874706 10 -4.165882 11 3.542353 12 1.117059 13 3.470000 14 1.446471 15 -0.705882 16 0.735882 17 -3.228235 18 -1.301765 19 0.456471 20 4.195882 21 0.597059 22 -3.064706 23 5.836471 24 -1.105294 25 -2.311765 26 0.198235 27 0.297647 28 -0.252941 29 2.858824 30 2.528824 31 2.844706 dtype: float64 Sum of All Differences: 60.26470588235294 Sum of Squared Differences: 175.26139377162627 correlation: 0.825700621440239
xxxxxxxxxx### New DataFrameMake a new dataframe that displays the difference for each team.The difference used is the prediction - the projection. Therefore, a negative number suggests that a team is conceding less than predicted. A positive number suggests that a team is conceding more points than predicted. This model has shown 8 teams out of 32 have deviated from the prediction by more than 3.01 points (3 points for a field goal) and none are 7 or further away.This can be considered a success as the majority of the teams points allowed are less than a field goal.In addition, the situational factors could explain some of the deviance such as a team attempting a 4th down try, late into games, instead of a field goal and getting a Touchdown when earlier in the game would unlikely occur. Similarly turnovers and poor field position are other factors that have not been included in this model which could explain for the deviance.New DataFrame¶
Make a new dataframe that displays the difference for each team.
The difference used is the prediction - the projection.
Therefore, a negative number suggests that a team is conceding less than predicted. A positive number suggests that a team is conceding more points than predicted.
This model has shown 8 teams out of 32 have deviated from the prediction by more than 3.01 points (3 points for a field goal) and none are 7 or further away.
This can be considered a success as the majority of the teams points allowed are less than a field goal.
In addition, the situational factors could explain some of the deviance such as a team attempting a 4th down try, late into games, instead of a field goal and getting a Touchdown when earlier in the game would unlikely occur. Similarly turnovers and poor field position are other factors that have not been included in this model which could explain for the deviance.
Teams = ('Arizona Cardinals','Atlanta Falcons','Baltimore Ravens','Buffalo Bills','Carolina Panthers','Chicago Bears','Cincinnati Bengals','Cleveland Browns','Dallas Cowboys','Denver Broncos','Detroit Lions','Green Bay Packers','Houston Texans','Indianapolis Colts','Jacksonville Jaguars','Kansas City Chiefs','Las Vegas Raiders','Los Angeles Chargers','Los Angeles Rams','Miami Dolphins','Minnesota Vikings','New England Patriots','New Orleans Saints','New York Giants','New York Jets','Philadelphia Eagles','Pittsburgh Steelers','San Francisco 49ers','Seattle Seahawks','Tampa Bay Buccaneers','Tennessee Titans','Washington Commanders', )df1 = pd.DataFrame(columns=['Teams'])df1['Teams'] = Teamsdiff = preg - pgdf1['P/G Diff'] = diffdf1| Teams | P/G Diff | |
|---|---|---|
| 0 | Arizona Cardinals | -1.346471 |
| 1 | Atlanta Falcons | 0.131765 |
| 2 | Baltimore Ravens | -1.464118 |
| 3 | Buffalo Bills | 0.903529 |
| 4 | Carolina Panthers | 1.268235 |
| 5 | Chicago Bears | 2.062941 |
| 6 | Cincinnati Bengals | 0.714706 |
| 7 | Cleveland Browns | -1.087059 |
| 8 | Dallas Cowboys | 3.148824 |
| 9 | Denver Broncos | 1.874706 |
| 10 | Detroit Lions | -4.165882 |
| 11 | Green Bay Packers | 3.542353 |
| 12 | Houston Texans | 1.117059 |
| 13 | Indianapolis Colts | 3.470000 |
| 14 | Jacksonville Jaguars | 1.446471 |
| 15 | Kansas City Chiefs | -0.705882 |
| 16 | Las Vegas Raiders | 0.735882 |
| 17 | Los Angeles Chargers | -3.228235 |
| 18 | Los Angeles Rams | -1.301765 |
| 19 | Miami Dolphins | 0.456471 |
| 20 | Minnesota Vikings | 4.195882 |
| 21 | New England Patriots | 0.597059 |
| 22 | New Orleans Saints | -3.064706 |
| 23 | New York Giants | 5.836471 |
| 24 | New York Jets | -1.105294 |
| 25 | Philadelphia Eagles | -2.311765 |
| 26 | Pittsburgh Steelers | 0.198235 |
| 27 | San Francisco 49ers | 0.297647 |
| 28 | Seattle Seahawks | -0.252941 |
| 29 | Tampa Bay Buccaneers | 2.858824 |
| 30 | Tennessee Titans | 2.528824 |
| 31 | Washington Commanders | 2.844706 |
xxxxxxxxxx### Bar Chartcreate a bar chart to show the variation in the prediction compared to the projection for ech teamBar Chart¶
create a bar chart to show the variation in the prediction compared to the projection for ech team
plt.figure(figsize=(10,5))plt.bar(df1['Teams'], df1['P/G Diff'],color = ['firebrick','darkred','purple','blue','cyan', 'orange', 'brown','darkorange', 'midnightblue', 'darkorange', 'cornflowerblue', 'forestgreen', 'darkblue', 'blue', 'turquoise', 'red','black','dodgerblue','blue','aqua', 'darkviolet','midnightblue','black', 'blue','darkgreen','lime','black','darkred','darkslategray','darkred','darkturquoise','maroon' ])plt.axhline(y = 0, color = 'black', linestyle = '-')plt.xticks(rotation=90)plt.show<function matplotlib.pyplot.show(close=None, block=None)>
df1.to_csv(r'C:\Users\Rob\Documents\DPredDiff.csv', index = False)xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-
Variables
Callstack
Breakpoints
Source
xxxxxxxxxx